memo

HOW IT WORKS

what we're building & why

what we're building & why

"Vibe coding" is revolutionizing software development. We think "vibe analysis"—think "Cursor for data & analytics"—has the potential to be much bigger. There are ~2 million software engineers in the US, while ~4 million data professionals support business roles (sales, marketing, ops, etc.) with ad-hoc data requests and report building everyday.

In the US alone, we’re spending 10+ billion hours a year building and reviewing these reports. Our thesis is that AI will greatly increase that number. Many more people will do analysis, much faster and much more often (much like what is happening with software development).

What Cursor, Claude Code, Lovable, etc are doing for code, we're doing for data. Our two goals are:

  1. equip data engineer’s with AI agents to get more done, faster

  2. equip business users with AI agents to explore data on their own

"text-to-sql" is solved

"text-to-sql" is solved

We define “text-to-sql” as: converting an unambiguous natural language question into a SQL query, given all required context.

Based on this definition, text-to-sql is more or less solved.

To prove this point, I like to show people this eval from Rishabh in Feb 2025. It shows top models hit 97%+ accuracy for correct SQL with adequate instruction (even prior to the release of models like Sonnet 4, Gemini 2.5 Pro, Grok 4, etc).

Performance on SQL-Eval bechmark (314 questions)

“self-serve” still isn’t solved

“self-serve” still isn’t solved

So, if text-to-sql is solved… why aren't companies everywhere rolling out AI data analysts? Shouldn’t every user have an AI to query, visualize, and find insights?

The issue isn’t writing good SQL. The issue is a lack of context and documentation. To help people understand this, I like to make this analogy:

Imagine you hire a data guy with 10+ years of experience. He's a SQL wiz. Day one, he gets access to your company’s data warehouse, dbt repo, BI tools, etc - and then he’s immediately cut off from the rest of the data team. He’s on his own, with only the context that is found within these few tools.

Will he be able to answer 100 stakeholder data requests with 100% accuracy? The answer is definitively - “no”. There is no way in hell he knows that:

  • the `revenue` field is gross revenue, not net

  • "active users" means users with transactions over $50 in last 60 days

  • null values in `subscription_end_date` do not indicate an ongoing subscription

  • if `item_qty` was >1 the record was a wholesale order, otherwise it was a retail order

  • the `orders` and `payments` join needs to filter out tests records from QA

  • the E1 enum in `product_category` maps to 'Electronics'

Without clean models and docs, he is forced to make assumptions at every turn. An AI analyst is virtually the same: excellent at writing sql, making high quality assumptions, and even validating those assumptions (IMO, Buster is actually better at this than most human analysts). But it just can’t know what it doesn’t know.

the three bottlenecks keeping companies from "self-serve"

the three bottlenecks keeping companies from “self-serve”

As a result, there are three painful bottlenecks that keep companies from successfully implementing AI-powered self-serve:

  1. Creating the initial documentation is daunting

    It can take months for a data team to build, clean, optimize, and document their data models - and most lack a framework to do it in a way that is optimized for AI agents.

  1. Answering data requests is more complex than spitting out a single SQL query

    Human analysts don’t just receive a data request → spit back a SQL query. Instead, they form hypotheses → run various queries to explore and validate assumptions → iterate and adapt as they go → and only create a final deliverable once they understand the full picture. Getting an AI agent to mimic this same workflow (and do it well) is complex.

  1. It’s impossible to think of every edge case that might occur

    Even with robust docs in place, you just can’t predict every metric, aggregation, filter, etc that users will ask for. An AI agent is bound to run into nuances, undefined metrics, etc. There are, theoretically, an infinite amount of assumptions it will need to make - requiring significant maintenance. Identifying what improvements need to be made, pushing regular changes, and ensuring consistency across hundreds of files is painful.

a new framework for solving "self-serve"

a new framework for solving "self-serve"

We try to solve each of these three bottlenecks with three key AI agents:

  1. Onboarding Agent (Creating the initial documentation is daunting)

    We’ve built an Onboarding Agent that builds out initial models and documentation, optimized for our AI data analyst. Data teams can connect their dbt repository, BI tools, data warehouse, and other sources. Then, they run the `buster generate` command from their CLI to kick off the Onboarding Agent. For about 30 minutes, it:

    • Analyzes metadata and runs exploratory queries to understand data models, values, enumerations, and usage patterns.

    • Identifies & records key gaps in the documentation for further review.

    • Generates optimized documentation files directly in the dbt repo.

    This documentation becomes the source of truth for all future AI interactions.

  1. Analyst Agent (Answering data requests is more complex than spitting out a single SQL query)

    We’ve built an AI data analyst that handles data requests like a human analyst would. It’s like “Cursor for analytics”. Business users can submit data requests in our web app or in Slack via natural language. For each request, the Analyst Agent follows a recursive “deep research” process:

    • Reviews relevant dbt documentation, models, and metrics.

    • Iteratively runs lightweight queries to explore, validate assumptions, and gather insights.

    • Adapts based on findings, mimicking a human analyst's workflow.

    • Delivers reports with metrics, explanations, and data stories.

    This typically takes the agent 2-3 minutes and includes 15-20 tool calls, iteratively exploring and validating assumptions before creating a final deliverable (e.g. chart, dashboard, notebook-style report).

  1. Documentation Agent (It’s impossible to think of every edge case that might occur)

    We’ve built a post-processing agent that reviews the Analyst Agent’s work. We describe it as “Claude Code for data engineers”. When reviewing the Analyst Agent’s work, the Documentation Agent identifies key assumptions that the Analyst made due to missing documentation (e.g., undefined metrics or filters). These flagged assumptions are sent as Slack alerts to the data team.

    • Data team members can reply directly in Slack (e.g., "@Buster, calculate it this way").

    • A background agent interprets the feedback, identifies where to update the dbt docs, and generates a pull request for review and merge.

    This feedback loop allows the data team to continuously enhances documentation with minimal effort, improving the AI analyst's reliability over time.

Start using AI agents in your analytics workflows today