How to get your data AI-ready for data analysis

TL;DR: Getting your data "AI-ready" doesn't mean it needs to be perfect, it means matching your data preparation to your users' technical expertise. For technical users, simply having access to the data is often enough. For semi-technical users, focus on making data accessible with clear naming conventions. For non-technical stakeholders, invest in either spreadsheet-based AI analysis or well-modeled data with wide tables and descriptive field names. The key insight: AI can help prepare data for AI analysis, creating a powerful feedback loop that accelerates insights.

"Garbage in, garbage out."

If you've worked with any type of data, this is a saying that you've likely heard more than once, and maybe even used yourself. Especially now with AI, we're quick to throw this around.

But I've always taken issue with this saying, because it drastically oversimplifies the problem and has become a shortcut to dismissing AI in the enterprise, especially around anything involving data analysis.

Data scientists and engineers work with incredibly messy and disjointed data all the time. And yet we don't justify not hiring data scientists by saying "garbage in, garbage out." As a matter of fact, we tend to hire these data professionals precisely because the data is messy and the business needs help wrangling it and finding patterns.

In this post, I wanted to talk about what your data truly needs to look like to be "AI-ready." This is a nuanced discussion, so I'm going to break it down into a few parts. I'll talk about what AI-ready data means for:

  • Technical and experienced data professionals
  • Semi-technical individuals who can tell what's good or bad data but need help getting it all together
  • Non-technical individuals who are removed from the data and need maximum guardrails

As you can probably tell, this is a spectrum. Individuals don't always fit in a single category nicely. As a matter of fact, the same individual may be completely technically literate with one dataset in one tool, but not understand the first thing about another dataset. In a past role, for example, I knew our product data like the back of my hand and could write SQL queries all day to pull my own reports no matter how messy the data was, but I couldn't tell you the first thing about our marketing data.

With that, let's dive in.

Getting your data AI-ready

Data prep for technical data practitioners (data analysts, scientists and engineers)

As mentioned above, if you're a data analyst, scientist or engineer, you've likely been hired in large part because the organization you work for has a lot of data which they think they can put to good use. So your day-to-day likely involves working with messy, disjointed data. Despite this, at Fabi, we have thousands of these data practitioners using AI every day to supercharge their productivity.

So what's needed from the data to be AI-ready if you're a data pro? Roughly in priority order:

  1. You need the data. The list could almost just stop here, because this is the single most important thing you need. As long as you have the data, there's a good chance you can leverage AI to whip it into shape and further use AI to analyze it. But if you don't have the data, there isn't much you can do. Sometimes this is an easily solvable problem. Say for example you do have your data in Postgres but not in your data warehouse, then it's simply a question of moving your data around (or you can use Fabi Smartbooks that allow you to join data from disparate sources!). But if you don't have the data stored anywhere, then you may need to work with your engineering or product team to gather that data. And this can be quite involved.
  2. Structure your files. Every data practitioner works with files at some point: CSV, Excel, etc. If you're working with a file, take the time to clean it up. AI can do a remarkable job at parsing through complex Excel files, but the fewer multi-header sheets, merged cells and pivot tables the better.
  3. Refine your SQL query. If you're extracting your data from a data warehouse, you're likely using SQL. Even if your data isn't perfectly modeled in your data warehouse, if you're going to be using AI to analyze your SQL results, taking the time to query just the right data (and no more) and using aliases to relabel fields can go a very long way to helping the AI understand the data it's working with even without additional context. For example, if you have a field called "amount" that represents monthly recurring revenue, you should relabel that field "mrr" or better yet, "monthly_recurring_revenue_usd."

Data prep for semi-technical data practitioners (product managers, founders)

A lot of the tips that were shared above for the technical data practitioner also apply here. The biggest difference between a product manager and a data analyst often comes down to their ability to gather the data. So if you're part of a data team looking to put AI in the hands of these semi-technical individuals to help ease some of the load on the data teams (like we've done at Parasail), the most important thing you can do is make sure that the data is accessible. Once these individuals have the data, they generally know their data and domain well enough to be able to supervise the AI even if they aren't SQL experts.

I'll share an anecdote from one of our customers: They're a small team with few data resources, and all their data is stored in a relational database. They have a customer success (CS) team who doesn't know SQL that well, but is now, with the help of Fabi's AI Analyst Agent, able to uncover huge upsell opportunities and insights to bring to their customers during their quarterly business reviews. These CSMs are able to pull these insights because they know the data and business well enough to know if the results from the AI are accurate, even though they themselves cannot write hundred-line SQL queries from scratch on their own.

Data prep for non-technical stakeholders

Now we're entering a different category entirely. If you're considering using AI to empower completely non-technical individuals to self-serve, the stakes are much higher. If you're putting AI in the hands of an executive who isn't involved in the details, the AI results need to be 100% accurate. There is no room whatsoever for AI to hallucinate or pull the wrong metric when the CEO asks "How are sales in California trending?" (unless you enjoy a good firestorm, which most data teams do not in my experience).

So how do you get AI-ready data for this scenario? We see two paths:

  1. Provide an AI analyst that can work with spreadsheets. Executives and non-technical stakeholders love spreadsheets. This is your time to embrace that. If they have a spreadsheet, there's a very good chance that they really understand that data because it's likely their own spreadsheet or an export from a tool they're familiar with (Salesforce, I'm looking at you). Spreadsheets can be very large, but not so large that AI can't do a remarkable job pulling together accurate insights. Since the AI isn't trying to join data across dozens of tables, the accuracy rate will shoot up. This is where tools like Fabi Smartbooks with our various supported file formats are incredibly powerful. They're enterprise-grade, secure environments that non-technical team members can use to analyze their data with all the context about your business. And since it's fully collaborative, they can even share their work with you for you to make sure the results are correct.
  2. Refine your data model. If the intention is to provide an AI agent that can answer questions for completely non-technical stakeholders based on data in your data warehouse, this is where the data does need to be close to perfect. And oftentimes when teams say that their data isn’t AI-ready, this is what they have in mind. And for most organizations this is true. To make this work, you'll want to create a clean, managed and tested data model that's essentially self-explanatory. This topic warrants its own section which we'll dive into below.

Getting your data AI-ready for fully self-service analytics

So you want to be able to give your business team an AI agent that can handle ad hoc requests? As we started to touch on above, the data has to be very well modeled and managed for this scenario. We work with some of the best data teams around the world, and here are the most important tips:

  1. Wide tables. You can't create "one big table" to answer all business questions, but the fewer tables you give the AI the better. Fewer tables means fewer joins, which means fewer opportunities for the AI to make a mistake. In our experience, joins are one of the top areas that AI will tend to get wrong. As a rule of thumb, we recommend sticking to a dozen or two dozen tables if possible for any given agent.
  2. Clear field names. Take the time to clearly label each field. This is good practice anyway, but avoid things like "amt." Instead use "monthly_recurring_revenue_usd." Focus in particular on the fields that will be measures and ID fields.

There are additional things you can do to help improve accuracy, such as building and managing a semantic layer or providing sample queries, but our (perhaps contrarian) opinion is that 90% of the heavy lifting happens at the data modeling layer. If you're able to create a small, granular, clean and clearly labeled set of tables, you've done most of the work to be able to truly provide self-service analytics.

How to use AI to prep your data for AI

This is where we get a bit meta: If we're talking about using AI to help accelerate data professional or semi-technical stakeholder workflows, AI can actually help prep the data for data analysis with AI! Here's what we've learned from some of the best:

  1. Ask the AI to identify issues. A great first step is to actually ask the AI itself to spot any issues with the data! It might even surprise you with what it comes up with.
  2. Ask the AI to fix the issues it identified. Once you've reviewed the issues it spotted, list out the ones you want it to address.
  3. Ask the AI to fix issues you've identified. And of course, spot check the data yourself. Ask the AI to plot some of the data to help you spot potential issues on your own, and ask the AI to fix those issues as you go.

Following this technique, you'll be off to the races in no time!

Here’s a quick video showing this in action:

Closing thoughts on AI-ready data

The notion that data must be pristine before it can be useful for AI analysis is a myth that's holding organizations back. The reality is far more nuanced and, frankly, more encouraging. AI readiness isn't about achieving data perfection, it's about understanding your users and meeting them where they are.

For technical teams drowning in messy data, AI can be a powerful ally in the cleanup process itself. For semi-technical users who understand the business context, AI bridges the gap between domain expertise and technical execution. And for executives who need bulletproof insights, the path forward is either embracing the spreadsheets they already trust or investing in robust data modeling.

The most important tip: The best way to assess your AI-readiness is to simply connect or upload your data to Fabi and start testing with a team who can supervise the output. You'll quickly discover that the bar for "AI-ready" is likely lower than you think, and the path to improvement is clearer than the "garbage in, garbage out" mantra would have you believe.

The future of data analysis isn't about waiting for perfect data, it's about using AI to work with the data you have, progressively improving both your data quality and your analytical capabilities in tandem. That's not just a more realistic approach; it's a more powerful one.

Related reads
Subscribe to Query & Theory