
4 practical ways to use SQL and Python together for modern data analysis
TL;DR: Collaborative data analysis is a new wave that brings the technical and non-technical stakeholders together around the data. It fosters a data-driven culture and increases trust in the data. Collaborative data analysis requires an analysis platform that offers SQL, Python, AI and spreadsheet-like functionality in a single environment and can push data to where your stakeholders live.
When I was a product manager I was constantly using data to help make decisions or investigate customer or product issues. I had access to traditional BI, product analytics, PostgreSQL and Snowflake along with a myriad of other tools, but more often than not I would export the data and dump it into a spreadsheet, or a Jupyter notebook (for practical reasons it was actually usually Google Colab) and take screenshots to share in Slack or slides. And I noticed that this pattern was common across every team and almost every organization.
Reflecting on this, there are a few reasons why this happened:
Collaborative data analysis platforms solve these challenges by providing a complete data analysis environment with real-time collaboration and version control, reducing the need to bounce between different tools and allowing the team to discuss insights where they live. Let’s explore a bit more why this matters and what traits make a great collaborative data analysis platform.
The workflow described above, involving exporting data, taking screenshots and putting it in Slack, email, slides, Jupyter notebooks/Google Colab or spreadsheets probably feels familiar. Aside from the obvious inconvenience, the lack of collaborative data analysis also tends to reduce the use of data for strategic decision making, erodes trust in data and reports and hinders data literacy across the organization.
Let’s dive into each of these in more detail.
When teams and individuals are able to effectively collaborate around insights, data will make its way into more conversations and decisions. Generally speaking, most enterprise leaders want to be more data-driven, but this can be difficult to achieve when it takes multiple steps to find and share insights which then get lost in the corporate ether - a Snowflake survey shows that only 32% of managers primarily use data to make decisions while 75% agree that there’s a need to increase access to data.
If the insight is directly embedded where the team works, for example in Slack, email or Google Sheets or it’s readily available in reports where you can leave comments and tag coworkers, conversations will flourish around data.
Insights have a producer and a consumer. Oftentimes, it can take multiple iterations before the producer and consumer are aligned. If the producer is able to share their report or insight in the same environment where the consumer is viewing the data, iteration cycles will decrease significantly, boosting the team’s efficiency. The further the data is from where it’s being consumed, the more work it takes to produce a useful insight. Fabi.ai customers, for example, are able to reduce time to insight by up to 90% by working in a collaborative data analysis environment.
Sometimes screenshots are necessary, but as soon as the data delivered to the stakeholder is disconnected from its source, some amount of trust is lost. If a data analyst shares a report with a business stakeholder, who then exports the data and reshares it with a colleague, that colleague is likely going to question the source and accuracy of the data. On the other hand, if they saw the data along with the original author, and perhaps even the underlying data source, this immediately increases confidence in the data.
A lot of business stakeholders actually have a strong desire to better understand the data and are generally capable of even deciphering SQL enough to understand where the data is from and what it's showing. This is great! The more individuals across your organization have some level of understanding of the data (or at least your gold or wide tables) the better. This will help them better understand what data is available, the sorts of questions they can ask, and maybe even empower them to answer some of their own questions.
Collaborative data analysis platforms increase literacy by bringing the producers and the consumers of insights in a same environment with the underlying code and data modelling readily available. This in turn also increases trust in the data.
Before we talk about what makes a data analysis platform collaborative, we need to understand the different types of collaboration. There are three different types of collaboration relationships:
Although there are similarities between these, for example real-time commenting and updates are important in all cases, certain personas have specific needs. For instance, effective collaboration between data team members requires version control. This allows them to work in parallel on the same analysis without overriding each other’s code.
So let’s take a look at the characteristics of a collaborative data analysis platform that serves all these stakeholders.
A truly collaborative data analysis platform should have all tools in a single place. Individuals bring data to where it’s easiest to analyze and collaborate. If the work can’t all be done in one place, the collaboration workflow will be broken.
Data analysis typically requires some combination of spreadsheet, SQL, Python (for advanced data analysis and visualization) and AI. The more tools the platform offers the better.
Collaborative data analysis platforms should also encourage data sharing. Pushing insights and reports to where your stakeholders live, and maintaining those reports, should be easy. In particular, a good collaborative data analysis platform should offer a connector to and from Google Sheets as well as a way to push alerts to Slack or Microsoft Teams.
In order to bring individuals together around a report, the data must be up to date in an environment where users can comment and tag each other. Short of this, your stakeholders will export or screenshot the data and bring it to where discussions happen: documents and Slack.
Reproducibility is a complex topic that merits its own article. But for the sake of this topic, it’s important to find a collaborative data analysis platform that ensures that if the same report is run twice by different individuals or at different times, that it shows the same results. This sounds trivial, but proper reproducibility and variable statement management is the exception, not the rule. Especially when more advanced data analysis using Python.
Version control with tools like GitHub, Bitbucket or Gitlab has been a staple to good software development for the past two or more decades. Collaboration between two technical stakeholders working on a same report is near impossible without version control. One analyst making updates to another data analyst’s report without versioning can quickly lead to confusion and lots of wasted time.
A good collaborative data analysis platform should offer dashboard-as-code with git support.
If at this stage you’re thinking “Yes, this sounds awesome, but why aren’t all data analysis platforms offering all this functionality?” this section is a technical detour that gets into what’s changed in the past 5-10 years enabling new data analysis platforms to offer functionality that we could only dream of a few years ago.
Many of the legacy BI solutions were founded 10 or more years ago (yes, actually!), and there have been some massive changes since then that have changed the game. Specifically, in order to create a collaborative data analysis platform that brings together SQL, Python, AI and no-code, you need the following:
So why is this now possible?
Take all of these advances in aggregate, only now can we truly build modern, ai-powered data analysis platforms that combine all the tools you need in a single place with real-time collaboration.
In the very near future, all data analysis will be expected to be collaborative. Collaboration means both collaboration between technical and non-technical team members. So it’s just as important that the platform you choose provides best-in-class version control as it does a way to push data to Google Sheets, Slack or interactive reports.
Let’s quickly recap the key characteristics of a collaborative data analysis platform:
At Fabi.ai, we believe that data analysis should fundamentally be collaborative and that this requires a new way of thinking about data analysis and the tools we use. Our mission is to transform the way data is explored, used and shared. We’re building the first AI-powered collaborative data analysis platform designed to bring all stakeholders together. If you want to try it out, you can get started for free in less than 2 minutes.