The AI platform for analytics engineering

The AI platform for analytics engineering

Use AI agents to automate your analytics engineering tasks.

It's purpose-built to help data teams keep their dbt projects reliable, documented, and optimized for AI analytics.

Use AI agents to automate your analytics engineering tasks. It's purpose-built to help data teams keep their dbt projects reliable, documented, and optimized for AI analytics.

buster

~

buster



██████╗ ██╗ ██╗ ███████╗ ████████╗ ███████╗ ██████╗

██╔══██╗ ██║ ██║ ██╔════╝ ╚══██╔══╝ ██╔════╝ ██╔══██╗

██████╔╝ ██║ ██║ ███████╗ ██║ █████╗ ██████╔╝

██╔══██╗ ██║ ██║ ╚════██║ ██║ ██╔══╝ ██╔══██╗

██████╔╝ ╚██████╔╝ ███████║ ██║ ███████╗ ██║ ██║

╚═════╝ ╚═════╝ ╚══════╝ ╚═╝ ╚══════╝ ╚═╝ ╚═╝



BUSTER v0.3.1 — Your AI Data Worker.


You are standing in an open terminal. An AI awaits your commands.


ENTER send • \n newline • @ files • / commands




┌──────────────────────────────────────────────────────────────────────────────────────────────────┐

│ ❯ █Try "Review the changes in my current branch" |

└──────────────────────────────────────────────────────────────────────────────────────────────────┘

? for help


buster

~

buster



██████╗ ██╗ ██╗ ███████╗ ████████╗ ███████╗ ██████╗

██╔══██╗ ██║ ██║ ██╔════╝ ╚══██╔══╝ ██╔════╝ ██╔══██╗

██████╔╝ ██║ ██║ ███████╗ ██║ █████╗ ██████╔╝

██╔══██╗ ██║ ██║ ╚════██║ ██║ ██╔══╝ ██╔══██╗

██████╔╝ ╚██████╔╝ ███████║ ██║ ███████╗ ██║ ██║

╚═════╝ ╚═════╝ ╚══════╝ ╚═╝ ╚══════╝ ╚═╝ ╚═╝



BUSTER v0.3.1 — Your AI Data Worker.


You are standing in an open terminal. An AI awaits your commands.


ENTER send • \n newline • @ files • / commands




┌──────────────────────────────────────────────────────────────────────────────────────────────────┐

│ ❯ █Try "Review the changes in my current branch" |

└──────────────────────────────────────────────────────────────────────────────────────────────────┘

? for help


Trusted by modern data teams at top companies

What is Buster?

The AI platform for automating dbt workflows

The AI platform for automating dbt workflows

Buster is an AI agent platform built for analytics engineering. It provides data teams with AI agents that keep their dbt projects reliable, documented, and consistent — automatically.

Keep your dbt project reliable, documented, and consistent

Buster runs AI agents in your CI/CD and on recurring schedules. Agents deeply understand your models, schema, lineage, and metadata - and are triggered whenever your code changes to validate, document, and repair what’s needed.

IN PROGRESS

4

Reviewing PR from Nate Kelley

New PR: feature/add_customer_segments

Renaming inconsistent fields across custome…

Request from Dallin Bentley

Refreshing column descriptions

Schedule: Weekly, 9:30 AM

Checking for downstream impacts

CI/CD build on main branch detected

READY FOR REVIEW

2

Detected schema drift in production

Schedule: Daily, 1:00 AM

6hr

Updated customers.yml documentation

Merge to main detected

4hr

Detected schema drift in production

Weekly Schema Drift Check initiated

1:00 AM

Pulled latest manifest.json and run_results.json from main branch to compare model definitions.

1:00 AM

Identified 428 active models in project. Preparing warehouse comparison.

1:00 AM

Queried warehouse metadata for all production schemas and cached current state.

1:01 AM

Cross-referenced dbt model columns, data types, and constraints against live warehouse tables.

1:01 AM

Drift Detected

1:01 AM

- Detected data type mismatch in model: fct_order_summary

- order_total → changed from FLOAT (in dbt) to DECIMAL(12,2) (in warehouse).

Validating Impact

1:02 AM

- Found 3 downstream models directly impacted by change.

- Need to create new dbt schema test and update documentation in fct_order_summary.yml.

Generated new schema test and updated docs

1:04 AM

- Generated recommended dbt schema test

- Updated snippet for fct_order_summary.yml.

Ran tests locally

1:04 AM

- Ran updated tests

- All tests passed successfully

Opened PR & sent slack alert

1:05 AM

- Created GitHub issue #421 with drift details, impact summary, and code suggestions.

- Posted alert to Slack channel #buster-data-quality.

Checked other models for drift

1:05 AM

- No additional drift found.

Finished

1:11 AM

IN PROGRESS

4

Reviewing PR from Nate Kelley

New PR: feature/add_customer_segments

Renaming inconsistent fields across custome…

Request from Dallin Bentley

Refreshing column descriptions

Schedule: Weekly, 9:30 AM

Checking for downstream impacts

CI/CD build on main branch detected

READY FOR REVIEW

2

Detected schema drift in production

Schedule: Daily, 1:00 AM

6hr

Updated customers.yml documentation

Merge to main detected

4hr

Detected schema drift in production

Weekly Schema Drift Check initiated

1:00 AM

Pulled latest manifest.json and run_results.json from main branch to compare model definitions.

1:00 AM

Identified 428 active models in project. Preparing warehouse comparison.

1:00 AM

Queried warehouse metadata for all production schemas and cached current state.

1:01 AM

Cross-referenced dbt model columns, data types, and constraints against live warehouse tables.

1:01 AM

Drift Detected

1:01 AM

- Detected data type mismatch in model: fct_order_summary

- order_total → changed from FLOAT (in dbt) to DECIMAL(12,2) (in warehouse).

Validating Impact

1:02 AM

- Found 3 downstream models directly impacted by change.

- Need to create new dbt schema test and update documentation in fct_order_summary.yml.

Generated new schema test and updated docs

1:04 AM

- Generated recommended dbt schema test

- Updated snippet for fct_order_summary.yml.

Ran tests locally

1:04 AM

- Ran updated tests

- All tests passed successfully

Opened PR & sent slack alert

1:05 AM

- Created GitHub issue #421 with drift details, impact summary, and code suggestions.

- Posted alert to Slack channel #buster-data-quality.

Checked other models for drift

1:05 AM

- No additional drift found.

Finished

1:11 AM

Run AI agents in CI/CD

Agents trigger automatically on pull requests, merges, and builds—validating models, updating docs, and catching drift before changes are merged.

status-code-updates

f3f8f19

buster

bot

reviewed 1 hour ago

View changes

buster

bot

1 hour ago

Breaking Change Detected in Upstream Model Update

The order_status field was renamed to status_code, which won’t fail existing dbt tests but will break downstream logic in fct_order_summary, int_customer_activity, and report_monthly_sales.

These models transform the value of order_status into standardized fulfillment states — logic that will now misclassify or drop records because the expected field no longer exists under the original name.

Would you like me to update downstream references to handle this new field name and update the docs accordingly? I can commit those changes here or open a new PR.

Fix in CLI

Fix in web app

Run AI agents in CI/CD

Agents trigger automatically on pull requests, merges, and builds—validating models, updating docs, and catching drift before changes are merged.

status-code-updates

f3f8f19

buster

bot

reviewed 1 hour ago

View changes

buster

bot

1 hour ago

Breaking Change Detected in Upstream Model Update

The order_status field was renamed to status_code, which won’t fail existing dbt tests but will break downstream logic in fct_order_summary, int_customer_activity, and report_monthly_sales.

These models transform the value of order_status into standardized fulfillment states — logic that will now misclassify or drop records because the expected field no longer exists under the original name.

Would you like me to update downstream references to handle this new field name and update the docs accordingly? I can commit those changes here or open a new PR.

Fix in CLI

Fix in web app

Run AI agents in CI/CD

Agents trigger automatically on pull requests, merges, and builds—validating models, updating docs, and catching drift before changes are merged.

status-code-updates

f3f8f19

buster

bot

reviewed 1 hour ago

View changes

buster

bot

1 hour ago

Breaking Change Detected in Upstream Model Update

The order_status field was renamed to status_code, which won’t fail existing dbt tests but will break downstream logic in fct_order_summary, int_customer_activity, and report_monthly_sales.

These models transform the value of order_status into standardized fulfillment states — logic that will now misclassify or drop records because the expected field no longer exists under the original name.

Would you like me to update downstream references to handle this new field name and update the docs accordingly? I can commit those changes here or open a new PR.

Fix in CLI

Fix in web app

Run AI agents on a schedule

Recurring agents audit your dbt project for drift, stale tests, and outdated docs, keeping your warehouse clean without manual maintenance.

Schedule an agent

Name

Weekly Data Reliability Check

Frequency

Weekly: Mondays at 2:00 AM

Project scope

analytics/dbt_prod

Custom instructions

Run data quality and schema consistency checks across all production models every Monday.

For any model that fails more than two tests, do the following:

Summarize the root cause (schema drift, null violations, or unexpected value distribution).

Compare the result to the last 7-day window of tests to identify regressions.

If the failure appears new, create a GitHub issue linking to the affected dbt model.

Post a summary in #data-quality with:

number of tests run and failed

affected models

a short risk classification (low, medium, high)

If no issues are found, quietly log “✅ All checks passed” in the run history.

Use Buster from your terminal or IDE

Run agents on demand right from your terminal or IDE for ad-hoc tasks — like building new models, making changes across cascading models, etc.

models/marts/fct_customer_revenue.sql

{{ config(materialized='table') }}


with customer_orders as (


select

customer_id,

sum(order_total) as total_spent,

count(order_id) as order_count,

max(order_date) as last_order_date

from {{ ref('stg_orders') }}

group by 1


),


customer_tiers as (


select

customer_id,

case

when total_spent >= 10000 then 'platinum'

when total_spent >= 5000 then 'gold'

when total_spent >= 1000 then 'silver'

else 'bronze'

end as customer_tier

from customer_orders


),


final as (


select

c.customer_id,

c.customer_tier,

c.total_spent,

c.order_count,

d.region,

d.first_purchase_date

from customer_tiers c

left join {{ ref('dim_customer') }} d using (customer_id)


)


select * from final


Ask Buster to build models, update docs, explore…

Sonnet 4.5

Use cases

How data teams use Buster

How data teams use Buster

Teams that use Buster spend less time on maintenance and ship better data products faster—with higher reliability and cleaner models.

Identify data quality issues

Profile and validate models on every PR to catch anomalies, schema drift, and missing tests before they're merged to production.

Catch breaking changes early

Review PRs in upstream application repositories to flag breaking changes before they cascade into downstream models.

Automate test creation

Use agents to generate new tests on PRs and improve dbt tests on a regular cadence, expanding coverage and preventing silent regressions.

Enforce modeling standards

Apply naming, testing, and structural conventions across your dbt project automatically—no manual policing required.

Audit your warehouse on a schedule

Run agents on a recurring basis to interrogate your dbt project and find stale models, unused tests, outdated docs—keeping your warehouse clean without manual maintenance.

Auto-update and maintain docs

Update YAML and markdown docs with every model or schema change so your project stays accurate and AI-ready.

Documentation & AI context

Maintain your docs with AI

Maintain your docs with AI

Index your dbt project and generate robust documentation. Documentation is updated on every PR and can be used as AI context in tools like Omni, Hex, Snowflake Cortex, etc.

Automate documentation, from init to every PR

Run buster init once to generate complete project documentation, then let agents update it automatically on every pull request — keeping your metadata and docs perpetually in sync.

Buster is exceptional at writing context-rich documentation. Below is an actual file that was generated by Buster, without any human intervention or guidance:

sales_order_detail.yml

version: 2


models:

- name: sales_order_detail

description: |

Line item detail for sales orders, representing individual products purchased within each order.

Essential for product-level sales analysis, inventory tracking, and revenue attribution by SKU.

Lineage: sourced from `stg_sales_order_detail` which pulls from source sales.salesorderdetail;

applies lineTotal calculation (unitPrice * orderQty * (1 - unitPriceDiscount)).

Patterns: ~121k rows across ~31k distinct orders (~3.9 items per order average);

null rate on carrierTrackingNumber is 46% (metadata as of 2025-10-10);

62% of line items have orderQty = 1, with long-tail up to 32 units;

95% of lines use specialOfferID = 1 (standard pricing);

266 distinct products; unitPrice heavily right-skewed (mean $485, median $55, 99th percentile $3,578).

Watchouts: carrierTrackingNumber absent for ~46% of rows (non-shipped or bulk orders);

unitPriceDiscount is 0 for 97% of rows; specialOfferID values beyond 1-2 represent promotional pricing rarely applied;

lineTotal is pre-calculated in staging (not enforced by model grain).

Freshness/Scale: spans 2022-09-10 to 2025-10-10; daily refreshes.

columns:

- name: salesOrderID

description: |

Foreign key to sales_order_header (order entity).

How to use it: Join to sales_order_header for customer, territory, and order-level aggregates.

Data characteristics: ~31k distinct orders; typical order contains 1-5 line items (mean ~3.9).

Related columns: salesOrderDetailID is unique within this order context.

Watch out for: Not unique at line item grain; must combine with salesOrderDetailID for uniqueness.

tests:

- not_null

- relationships:

to: ref('sales_order_header')

field: salesOrderID


- name: salesOrderDetailID

description: |

Primary key; unique identifier for each line item.

How to use it: Use as the grain anchor for joins and deduplication.

Data characteristics: 121,317 distinct values = 100% unique (metadata as of 2025-10-10).

Watch out for: This is the sole PK; salesOrderID alone is not unique.

tests:

- unique

- not_null


- name: carrierTrackingNumber

description: |

Shipping carrier tracking number for the line item.

How to use it: Filter for shipped items; link to logistics/fulfillment systems.

Data characteristics: Null rate 46%; ~2,214 distinct values when present;

alphanumeric format (e.g., "1534-4AB5-81").

Patterns & Insights: Absence indicates non-shipped orders (e.g., digital, in-store pickup, or bulk direct-ship).

Watch out for: Do not assume presence; left join or COALESCE when computing ship metrics.


- name: orderQty

description: |

Quantity of the product ordered on this line item.

How to use it: SUM for volume analysis; group by for distribution insights.

Data characteristics: 62% of lines have qty = 1;

95th percentile = 7; max = 32; mean ~2.4 (metadata as of 2025-10-10).

Patterns & Insights: Long-tail distribution reflects bulk orders (likely B2B or restocking);

individual consumer orders cluster at 1-3 units.

Watch out for: Aggregations should weight by orderQty to avoid unit vs line-item confusion.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 1

max_value: 100


- name: productID

description: |

Foreign key to product table.

How to use it: Join to product dimensions for category, name, and attributes.

Data characteristics: 266 distinct products; top product (870) appears in 3.4% of lines;

moderate concentration (Gini ~0.52).

Patterns & Insights: Distribution suggests a core set of popular SKUs with long-tail specialty items.

Watch out for: Ensure product dimension is current to avoid orphaned productIDs.

tests:

- not_null


- name: specialOfferID

description: |

Foreign key to special offer/promotion applied to this line.

How to use it: Filter for promotional analysis; join to special_offer for campaign details.

Data characteristics: 95.5% are specialOfferID = 1 (standard pricing);

2.8% are ID = 2; remaining <2% distributed across IDs 3-16 (metadata as of 2025-10-10).

Patterns & Insights: Promotions are rare and targeted; standard pricing dominates.

Watch out for: Do not assume promo when unitPriceDiscount > 0; discount logic is independent.

tests:

- not_null

- accepted_values:

values: [1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 16]


- name: unitPrice

description: |

Price per unit in USD before discount.

Calculation: Pulled from product pricing at order time (historical snapshot).

Interpretation: Median $55, mean $485 (right-skewed);

99th percentile $3,578; max ~$3,578 (metadata as of 2025-10-10).

Aggregation guidance: Use lineTotal for revenue; unitPrice * orderQty for pre-discount;

winsorize or filter outliers for AOV analysis if high-value SKUs dominate.

Data notes: Reflects pricing at order date; does not update with catalog changes.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: unitPriceDiscount

description: |

Discount rate applied to unitPrice (decimal; 0 = no discount, 0.15 = 15% off).

Calculation: Stored as decimal proportion (not percentage).

Interpretation: 97.1% of lines have 0 discount; 1.1% have 2% discount;

remaining <2% have 5-35% discount (metadata as of 2025-10-10).

Patterns & Insights: Discounts are applied selectively; most transactions are full-price.

Watch out for: Do not confuse with specialOfferID; discount may apply independently of offer campaigns.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

max_value: 1

meta:

unit: proportion


- name: lineTotal

description: |

Line-level revenue in USD.

Calculation: unitPrice * orderQty * (1 - unitPriceDiscount); pre-calculated in stg_sales_order_detail.

Interpretation: Median $187, mean $974 (right-skewed);

99th percentile ~$8,588; max ~$27,894 (metadata as of 2025-10-10).

Aggregation guidance: SUM for revenue rollups; winsorize top 1% if outliers distort segment analysis.

Data notes: Excludes tax and shipping; reflects net product revenue only.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: rowguid

description: |

System-generated GUID for internal ETL tracking.

Watch out for: Not meaningful for business analysis; do not use as a join key or filter.


- name: modifiedDate

description: |

Timestamp of last update to the record.

How to use it: Track data freshness; filter for incremental loads.

Data characteristics: Spans 2022-09-10 to 2025-10-10; ~723 distinct timestamps;

clustered around daily batch times (6am/7am).

Watch out for: Reflects ETL refresh, not order date; use salesOrderID join for order context.

sales_order_detail.yml

version: 2


models:

- name: sales_order_detail

description: |

Line item detail for sales orders, representing individual products purchased within each order.

Essential for product-level sales analysis, inventory tracking, and revenue attribution by SKU.

Lineage: sourced from `stg_sales_order_detail` which pulls from source sales.salesorderdetail;

applies lineTotal calculation (unitPrice * orderQty * (1 - unitPriceDiscount)).

Patterns: ~121k rows across ~31k distinct orders (~3.9 items per order average);

null rate on carrierTrackingNumber is 46% (metadata as of 2025-10-10);

62% of line items have orderQty = 1, with long-tail up to 32 units;

95% of lines use specialOfferID = 1 (standard pricing);

266 distinct products; unitPrice heavily right-skewed (mean $485, median $55, 99th percentile $3,578).

Watchouts: carrierTrackingNumber absent for ~46% of rows (non-shipped or bulk orders);

unitPriceDiscount is 0 for 97% of rows; specialOfferID values beyond 1-2 represent promotional pricing rarely applied;

lineTotal is pre-calculated in staging (not enforced by model grain).

Freshness/Scale: spans 2022-09-10 to 2025-10-10; daily refreshes.

columns:

- name: salesOrderID

description: |

Foreign key to sales_order_header (order entity).

How to use it: Join to sales_order_header for customer, territory, and order-level aggregates.

Data characteristics: ~31k distinct orders; typical order contains 1-5 line items (mean ~3.9).

Related columns: salesOrderDetailID is unique within this order context.

Watch out for: Not unique at line item grain; must combine with salesOrderDetailID for uniqueness.

tests:

- not_null

- relationships:

to: ref('sales_order_header')

field: salesOrderID


- name: salesOrderDetailID

description: |

Primary key; unique identifier for each line item.

How to use it: Use as the grain anchor for joins and deduplication.

Data characteristics: 121,317 distinct values = 100% unique (metadata as of 2025-10-10).

Watch out for: This is the sole PK; salesOrderID alone is not unique.

tests:

- unique

- not_null


- name: carrierTrackingNumber

description: |

Shipping carrier tracking number for the line item.

How to use it: Filter for shipped items; link to logistics/fulfillment systems.

Data characteristics: Null rate 46%; ~2,214 distinct values when present;

alphanumeric format (e.g., "1534-4AB5-81").

Patterns & Insights: Absence indicates non-shipped orders (e.g., digital, in-store pickup, or bulk direct-ship).

Watch out for: Do not assume presence; left join or COALESCE when computing ship metrics.


- name: orderQty

description: |

Quantity of the product ordered on this line item.

How to use it: SUM for volume analysis; group by for distribution insights.

Data characteristics: 62% of lines have qty = 1;

95th percentile = 7; max = 32; mean ~2.4 (metadata as of 2025-10-10).

Patterns & Insights: Long-tail distribution reflects bulk orders (likely B2B or restocking);

individual consumer orders cluster at 1-3 units.

Watch out for: Aggregations should weight by orderQty to avoid unit vs line-item confusion.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 1

max_value: 100


- name: productID

description: |

Foreign key to product table.

How to use it: Join to product dimensions for category, name, and attributes.

Data characteristics: 266 distinct products; top product (870) appears in 3.4% of lines;

moderate concentration (Gini ~0.52).

Patterns & Insights: Distribution suggests a core set of popular SKUs with long-tail specialty items.

Watch out for: Ensure product dimension is current to avoid orphaned productIDs.

tests:

- not_null


- name: specialOfferID

description: |

Foreign key to special offer/promotion applied to this line.

How to use it: Filter for promotional analysis; join to special_offer for campaign details.

Data characteristics: 95.5% are specialOfferID = 1 (standard pricing);

2.8% are ID = 2; remaining <2% distributed across IDs 3-16 (metadata as of 2025-10-10).

Patterns & Insights: Promotions are rare and targeted; standard pricing dominates.

Watch out for: Do not assume promo when unitPriceDiscount > 0; discount logic is independent.

tests:

- not_null

- accepted_values:

values: [1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 16]


- name: unitPrice

description: |

Price per unit in USD before discount.

Calculation: Pulled from product pricing at order time (historical snapshot).

Interpretation: Median $55, mean $485 (right-skewed);

99th percentile $3,578; max ~$3,578 (metadata as of 2025-10-10).

Aggregation guidance: Use lineTotal for revenue; unitPrice * orderQty for pre-discount;

winsorize or filter outliers for AOV analysis if high-value SKUs dominate.

Data notes: Reflects pricing at order date; does not update with catalog changes.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: unitPriceDiscount

description: |

Discount rate applied to unitPrice (decimal; 0 = no discount, 0.15 = 15% off).

Calculation: Stored as decimal proportion (not percentage).

Interpretation: 97.1% of lines have 0 discount; 1.1% have 2% discount;

remaining <2% have 5-35% discount (metadata as of 2025-10-10).

Patterns & Insights: Discounts are applied selectively; most transactions are full-price.

Watch out for: Do not confuse with specialOfferID; discount may apply independently of offer campaigns.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

max_value: 1

meta:

unit: proportion


- name: lineTotal

description: |

Line-level revenue in USD.

Calculation: unitPrice * orderQty * (1 - unitPriceDiscount); pre-calculated in stg_sales_order_detail.

Interpretation: Median $187, mean $974 (right-skewed);

99th percentile ~$8,588; max ~$27,894 (metadata as of 2025-10-10).

Aggregation guidance: SUM for revenue rollups; winsorize top 1% if outliers distort segment analysis.

Data notes: Excludes tax and shipping; reflects net product revenue only.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: rowguid

description: |

System-generated GUID for internal ETL tracking.

Watch out for: Not meaningful for business analysis; do not use as a join key or filter.


- name: modifiedDate

description: |

Timestamp of last update to the record.

How to use it: Track data freshness; filter for incremental loads.

Data characteristics: Spans 2022-09-10 to 2025-10-10; ~723 distinct timestamps;

clustered around daily batch times (6am/7am).

Watch out for: Reflects ETL refresh, not order date; use salesOrderID join for order context.

sales_order_detail.yml

version: 2


models:

- name: sales_order_detail

description: |

Line item detail for sales orders, representing individual products purchased within each order.

Essential for product-level sales analysis, inventory tracking, and revenue attribution by SKU.

Lineage: sourced from `stg_sales_order_detail` which pulls from source sales.salesorderdetail;

applies lineTotal calculation (unitPrice * orderQty * (1 - unitPriceDiscount)).

Patterns: ~121k rows across ~31k distinct orders (~3.9 items per order average);

null rate on carrierTrackingNumber is 46% (metadata as of 2025-10-10);

62% of line items have orderQty = 1, with long-tail up to 32 units;

95% of lines use specialOfferID = 1 (standard pricing);

266 distinct products; unitPrice heavily right-skewed (mean $485, median $55, 99th percentile $3,578).

Watchouts: carrierTrackingNumber absent for ~46% of rows (non-shipped or bulk orders);

unitPriceDiscount is 0 for 97% of rows; specialOfferID values beyond 1-2 represent promotional pricing rarely applied;

lineTotal is pre-calculated in staging (not enforced by model grain).

Freshness/Scale: spans 2022-09-10 to 2025-10-10; daily refreshes.

columns:

- name: salesOrderID

description: |

Foreign key to sales_order_header (order entity).

How to use it: Join to sales_order_header for customer, territory, and order-level aggregates.

Data characteristics: ~31k distinct orders; typical order contains 1-5 line items (mean ~3.9).

Related columns: salesOrderDetailID is unique within this order context.

Watch out for: Not unique at line item grain; must combine with salesOrderDetailID for uniqueness.

tests:

- not_null

- relationships:

to: ref('sales_order_header')

field: salesOrderID


- name: salesOrderDetailID

description: |

Primary key; unique identifier for each line item.

How to use it: Use as the grain anchor for joins and deduplication.

Data characteristics: 121,317 distinct values = 100% unique (metadata as of 2025-10-10).

Watch out for: This is the sole PK; salesOrderID alone is not unique.

tests:

- unique

- not_null


- name: carrierTrackingNumber

description: |

Shipping carrier tracking number for the line item.

How to use it: Filter for shipped items; link to logistics/fulfillment systems.

Data characteristics: Null rate 46%; ~2,214 distinct values when present;

alphanumeric format (e.g., "1534-4AB5-81").

Patterns & Insights: Absence indicates non-shipped orders (e.g., digital, in-store pickup, or bulk direct-ship).

Watch out for: Do not assume presence; left join or COALESCE when computing ship metrics.


- name: orderQty

description: |

Quantity of the product ordered on this line item.

How to use it: SUM for volume analysis; group by for distribution insights.

Data characteristics: 62% of lines have qty = 1;

95th percentile = 7; max = 32; mean ~2.4 (metadata as of 2025-10-10).

Patterns & Insights: Long-tail distribution reflects bulk orders (likely B2B or restocking);

individual consumer orders cluster at 1-3 units.

Watch out for: Aggregations should weight by orderQty to avoid unit vs line-item confusion.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 1

max_value: 100


- name: productID

description: |

Foreign key to product table.

How to use it: Join to product dimensions for category, name, and attributes.

Data characteristics: 266 distinct products; top product (870) appears in 3.4% of lines;

moderate concentration (Gini ~0.52).

Patterns & Insights: Distribution suggests a core set of popular SKUs with long-tail specialty items.

Watch out for: Ensure product dimension is current to avoid orphaned productIDs.

tests:

- not_null


- name: specialOfferID

description: |

Foreign key to special offer/promotion applied to this line.

How to use it: Filter for promotional analysis; join to special_offer for campaign details.

Data characteristics: 95.5% are specialOfferID = 1 (standard pricing);

2.8% are ID = 2; remaining <2% distributed across IDs 3-16 (metadata as of 2025-10-10).

Patterns & Insights: Promotions are rare and targeted; standard pricing dominates.

Watch out for: Do not assume promo when unitPriceDiscount > 0; discount logic is independent.

tests:

- not_null

- accepted_values:

values: [1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 16]


- name: unitPrice

description: |

Price per unit in USD before discount.

Calculation: Pulled from product pricing at order time (historical snapshot).

Interpretation: Median $55, mean $485 (right-skewed);

99th percentile $3,578; max ~$3,578 (metadata as of 2025-10-10).

Aggregation guidance: Use lineTotal for revenue; unitPrice * orderQty for pre-discount;

winsorize or filter outliers for AOV analysis if high-value SKUs dominate.

Data notes: Reflects pricing at order date; does not update with catalog changes.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: unitPriceDiscount

description: |

Discount rate applied to unitPrice (decimal; 0 = no discount, 0.15 = 15% off).

Calculation: Stored as decimal proportion (not percentage).

Interpretation: 97.1% of lines have 0 discount; 1.1% have 2% discount;

remaining <2% have 5-35% discount (metadata as of 2025-10-10).

Patterns & Insights: Discounts are applied selectively; most transactions are full-price.

Watch out for: Do not confuse with specialOfferID; discount may apply independently of offer campaigns.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

max_value: 1

meta:

unit: proportion


- name: lineTotal

description: |

Line-level revenue in USD.

Calculation: unitPrice * orderQty * (1 - unitPriceDiscount); pre-calculated in stg_sales_order_detail.

Interpretation: Median $187, mean $974 (right-skewed);

99th percentile ~$8,588; max ~$27,894 (metadata as of 2025-10-10).

Aggregation guidance: SUM for revenue rollups; winsorize top 1% if outliers distort segment analysis.

Data notes: Excludes tax and shipping; reflects net product revenue only.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: rowguid

description: |

System-generated GUID for internal ETL tracking.

Watch out for: Not meaningful for business analysis; do not use as a join key or filter.


- name: modifiedDate

description: |

Timestamp of last update to the record.

How to use it: Track data freshness; filter for incremental loads.

Data characteristics: Spans 2022-09-10 to 2025-10-10; ~723 distinct timestamps;

clustered around daily batch times (6am/7am).

Watch out for: Reflects ETL refresh, not order date; use salesOrderID join for order context.

Power AI tools and self-serve analytics

Buster’s documentation layer is fully compatible with tools like Hex, Omni, and Snowflake Cortex — and also fuels Buster’s built-in AI reporting and self-serve analytics experience.

Sales Over Recent Months

update me on our sales over recent months

Reasoned for 6 minutes

Sales Over Recent Months

Version 1

I spent a few minutes digging into sale data from recent months and spun up this report for you.

It looks like sales dipped last month, likely as the result of a 28% decrease in marketing spend. This especially impacted sales in the Electronics Category.

Ask a follow up...

Our AI may make mistakes. Check important info.

Report

File

Sales Over Recent Months

Apr 17, 2025

Created by Buster

Last month, sales experienced a significant decline - dropping nearly 21% compared to the previous month. This report investigates the reasons behind this decline using historical sales data, marketing spend, and competitor activity.

Sales Decline in Electronics Category

Last month's sales fell nearly 21% below the previous month, with a significant 67.42% drop in the electronics category compared to the previous month.

Monthly Total Sales and Monthly Electronics Sales

Last 6 months

What were total sales and electronics sales over the last 6 months?

Total Sales

Electronics Sales

Impact of Reduced Marketing Spend

Marketing spend decreased by 28% last month. Regression analysis indicates a strong historic correlation (R² = 0.78) between your marketing spend and sales, suggesting this reduction significantly contributed to the sales dip.

Marketing Spend & Electronics Sales

Last 6 months

What was marketing spend and electronics sales over the last 6 months?

Marketing Spend

Electronics Sales

Customers

Real results from modern data teams

Real results from modern data teams

4x fewer breaking changes in prod

3x more data quality issues detected

3x faster PR cycles

100% of models documented

16.5x increase in self-served data requests

"Buster frees me up from the ad-hoc tasks I always had to do so I can focus on longer term goals."

Landen Bailey

Senior Data Engineer, Redo

"A lot of data engineers think self serve is a myth. This is actually self serve, for real for real."

Alex Ahlstrom

Director of Analytics, Angel Studios

Enterprise & security

Buster is built with enterprise-grade security practices. This includes state-of-the-art encryption, safe and reliable infrastructure partners, and independently verified security controls.

SOC 2 Type II compliant

Buster has undergone a Service Organization Controls audit (SOC 2 Type II).

HIPAA compliant

Privacy & security measures to ensure that PHI is appropriately safeguarded.

Permissions & governance

Provision users, enforce permissions, & implement robust governance.

IP protection policy

Neither Buster nor our model partners train models on customer data.

Self-hosted deployment

Deploy in your own air-gapped environment.

Secure connections

SSL and pass-through OAuth available.

Start using AI agents in your analytics workflows today

Start using AI agents in your analytics workflows today

Copyright © 2025 Sprint Labs, Inc

All rights reserved.

Copyright © 2025 Sprint Labs, Inc

All rights reserved.