Get your analytics AI-ready

Get your analytics AI-ready

Buster is an AI platform that helps data teams keep their dbt projects reliable, documented, and consistent.

Buster is an AI platform that helps data teams keep their dbt projects reliable, documented, and consistent.

sales_order_detail.yml

Show diff

Copied

Copy file

version: 2

models:
- name: sales_order_detail
description: |

Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.

relationships:
- name: sales_order_header
description: >
Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).
Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).
Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.
Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.
source_col: salesOrderID
ref_col: salesOrderID
cardinality: many-to-one

sales_order_detail.yml

Show diff

Copied

Copy file

version: 2

models:
- name: sales_order_detail
description: |

Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.

relationships:
- name: sales_order_header
description: >
Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).
Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).
Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.
Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.
source_col: salesOrderID
ref_col: salesOrderID
cardinality: many-to-one

Trusted by modern data teams at top companies

The problem

AI needs robust context to reliably use your data

AI needs robust context to reliably use your data

The solution

Buster documents your data, so AI actually understands it

Buster documents your data, so AI actually understands it

models/marts/sales_order_detail,yml

version: 2

models:
- name: sales_order_detail
description: |
Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.


Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.

relationships:
- name: sales_order_header
description: >
Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).
Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).
Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.
Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.
source_col: salesOrderID
ref_col: salesOrderID
cardinality: many-to-one

version: 2

models:
- name: sales_order_detail
description: |
Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.


Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.

relationships:
- name: sales_order_header
description: >
Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).
Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).
Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.
Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.
source_col: salesOrderID
ref_col: salesOrderID
cardinality: many-to-one

Ask anything...

Attach

Search

Reason

Generate AI context

Generate AI context

Buster generates robust docs within your dbt project

Buster generates robust docs within your dbt project

sales_order_detail.yml

Show diff

Copied

Copy file

version: 2

models:
- name: sales_order_detail
description: |

Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.


relationships:

- name: sales_order_header

description: >

Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).

Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).

Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.

Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.

source_col: salesOrderID

ref_col: salesOrderID

cardinality: many-to-one


- name: product

description: >

Business relationship: Each line item represents sale of one specific product. Product table provides descriptive attributes (name, category, subcategory, costs, specifications, attributes) needed for product performance analysis, profitability calculation, and inventory analysis.

Join considerations: Many-to-one from detail to product. Each productID in details matches exactly one product. Popular products appear across many line items (top product appears in 3.7% of line items; long tail products may appear only once).

Coverage: 100% of line items match to products. Only in-catalog products can be sold.

Cardinality notes: Standard fact-to-dimension pattern. Use INNER JOIN (all details have valid productID). Join preserved detail-level grain. For product-level metrics, aggregate details by productID: SUM(lineTotal) for revenue, COUNT(*) for transaction count, SUM(orderQty) for units sold, AVG(unitPrice) for typical selling price.

source_col: productID

ref_col: productID

cardinality: many-to-one


- name: special_offer

description: >

Business relationship: Each line item can apply one special offer/promotion that determines discount structure. Special offer defines discount rules (type, percentage, quantity thresholds, date ranges). 95% of line items use offer ID 1 (baseline "No Discount" offer), making promotional sales the exception rather than rule.

Join considerations: Many-to-one from detail to offer. Each specialOfferID matches exactly one offer definition. Most analysis segments by "has discount" (specialOfferID != 1) vs "no discount" rather than analyzing individual offer IDs.

Coverage: 100% of line items match to special offers (offer ID 1 always exists as default).

Cardinality notes: Heavy concentration on single offer (95% share). LEFT JOIN unnecessary since all details have valid offer, but won't hurt. For discount analysis, compare unitPriceDiscount (actual discount applied) with special_offer.discountPct (offer's defined discount) to understand discount application patterns.

source_col: specialOfferID

ref_col: specialOfferID

cardinality: many-to-one


columns:

- name: salesOrderID

description: |

Foreign key linking line item to its parent sales order in sales_order_header. Groups line items into complete orders for order-level analysis.

Every line item belongs to exactly one order. Orders typically contain multiple line items (avg 3.9 per order), though single-item orders are common. Range: 43659-68840 with ~7K distinct orders represented across 121K line items.

Use this to join to sales_order_header for order-level context (customer, dates, shipping, territory). When calculating order-level metrics from line item data, always aggregate by salesOrderID first (e.g., SUM(lineTotal) grouped by salesOrderID gives order total) before further grouping to avoid over-weighting multi-item orders. Distribution shows slight recency bias toward higher IDs (newer orders), suggesting ongoing business activity through date range.

No nulls - every detail must belong to order. Part of composite business key with salesOrderDetailID, though salesOrderDetailID alone is unique across entire table.

data_type: BIGINT


- name: salesOrderDetailID

description: |

Primary key uniquely identifying each line item across the entire table. Represents sequential line item numbering across all orders.

Range: 1-100827 with perfect uniqueness (~121K current rows, suggesting some IDs retired or skipped). Despite name suggesting "detail within order", this ID is globally unique, not just within an order. Conceptually represents the line item number, but implemented as table-wide identifier.

Use as primary key for uniqueness. Combine with salesOrderID for composite business key if needed for readability, though salesOrderDetailID alone guarantees uniqueness. No nulls, no duplicates. Sequential but not gapless - skipped numbers are normal (order cancellations, returns, or system behavior).

data_type: BIGINT


- name: carrierTrackingNumber

description: |

Shipment tracking identifier assigned by carrier for this line item. Enables shipment tracking and delivery confirmation.

Format: Standardized XX00-X000-XX pattern (e.g., "52CC-460F-B3"). ~2K distinct tracking numbers across line items. 45% null, indicating orders not yet shipped, ship methods without tracking, in-store pickup, or bundled shipments where tracking applies at order level rather than line level.

Use null vs non-null to segment shipped vs unshipped items. However, interpret carefully - null doesn't definitively mean "not shipped" as some fulfillment methods legitimately don't generate line-level tracking. For true shipment analysis, prefer sales_order_header.shipDate which provides order-level shipping status. Tracking numbers aren't strictly unique - multiple line items from same order may share tracking when shipped together.

data_type: VARCHAR


- name: orderQty

description: |

Quantity of product units ordered on this line item. Drives revenue via multiplication with unitPrice.

Highly concentrated at low quantities: 58% are qty 1, 71% are qty 1-2, 89% are qty 1-4. Long tail extends to qty 44, but quantities above 10 represent <3% of line items. Mean: 2.4, median: 1, heavily right-skewed distribution.

Represents customer purchase decision quantity. Multiply by unitPrice to calculate pre-discount subtotal, or use calculated lineTotal field which accounts for both quantity and discount. High quantities (>20) almost always involve accessories, components, or clothing - not bikes (bikes rarely exceed qty 5). For inventory and fulfillment analysis, this is units to pick/ship. For revenue analysis, this amplifies unit economics (price and discount) into total line value.

Watch out for: Outlier quantities can distort averages. When analyzing "typical" purchase behavior, consider median or restrict to qty ≤10. For total units sold analysis, SUM(orderQty) grouped appropriately. No nulls - quantity always specified.

data_type: BIGINT

options:

- value: 1

description: Single unit - dominant pattern; ~58% of line items

- value: 2

description: Two units; ~13% of line items

- value: 3

description: Three units; ~9% of line items

- value: 4

description: Four units; ~7% of line items

- value: 5

description: Five units; ~4% of line items


- name: productID

description: |

Foreign key identifying which product was sold on this line item. Links to product table for descriptive attributes, costs, and categorization.

Range: 707-999 representing 259 distinct products actively sold. Product concentration is significant: top 10 products account for 20% of line items, with product 870 alone appearing in 3.7% of line items. Distribution shows most products sell regularly (few "one-off" products), suggesting active catalog management.

Use to join to product table for product name.

...

sales_order_detail.yml

Show diff

Copied

Copy file

version: 2

models:
- name: sales_order_detail
description: |

Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.


relationships:

- name: sales_order_header

description: >

Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).

Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).

Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.

Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.

source_col: salesOrderID

ref_col: salesOrderID

cardinality: many-to-one


- name: product

description: >

Business relationship: Each line item represents sale of one specific product. Product table provides descriptive attributes (name, category, subcategory, costs, specifications, attributes) needed for product performance analysis, profitability calculation, and inventory analysis.

Join considerations: Many-to-one from detail to product. Each productID in details matches exactly one product. Popular products appear across many line items (top product appears in 3.7% of line items; long tail products may appear only once).

Coverage: 100% of line items match to products. Only in-catalog products can be sold.

Cardinality notes: Standard fact-to-dimension pattern. Use INNER JOIN (all details have valid productID). Join preserved detail-level grain. For product-level metrics, aggregate details by productID: SUM(lineTotal) for revenue, COUNT(*) for transaction count, SUM(orderQty) for units sold, AVG(unitPrice) for typical selling price.

source_col: productID

ref_col: productID

cardinality: many-to-one


- name: special_offer

description: >

Business relationship: Each line item can apply one special offer/promotion that determines discount structure. Special offer defines discount rules (type, percentage, quantity thresholds, date ranges). 95% of line items use offer ID 1 (baseline "No Discount" offer), making promotional sales the exception rather than rule.

Join considerations: Many-to-one from detail to offer. Each specialOfferID matches exactly one offer definition. Most analysis segments by "has discount" (specialOfferID != 1) vs "no discount" rather than analyzing individual offer IDs.

Coverage: 100% of line items match to special offers (offer ID 1 always exists as default).

Cardinality notes: Heavy concentration on single offer (95% share). LEFT JOIN unnecessary since all details have valid offer, but won't hurt. For discount analysis, compare unitPriceDiscount (actual discount applied) with special_offer.discountPct (offer's defined discount) to understand discount application patterns.

source_col: specialOfferID

ref_col: specialOfferID

cardinality: many-to-one


columns:

- name: salesOrderID

description: |

Foreign key linking line item to its parent sales order in sales_order_header. Groups line items into complete orders for order-level analysis.

Every line item belongs to exactly one order. Orders typically contain multiple line items (avg 3.9 per order), though single-item orders are common. Range: 43659-68840 with ~7K distinct orders represented across 121K line items.

Use this to join to sales_order_header for order-level context (customer, dates, shipping, territory). When calculating order-level metrics from line item data, always aggregate by salesOrderID first (e.g., SUM(lineTotal) grouped by salesOrderID gives order total) before further grouping to avoid over-weighting multi-item orders. Distribution shows slight recency bias toward higher IDs (newer orders), suggesting ongoing business activity through date range.

No nulls - every detail must belong to order. Part of composite business key with salesOrderDetailID, though salesOrderDetailID alone is unique across entire table.

data_type: BIGINT


- name: salesOrderDetailID

description: |

Primary key uniquely identifying each line item across the entire table. Represents sequential line item numbering across all orders.

Range: 1-100827 with perfect uniqueness (~121K current rows, suggesting some IDs retired or skipped). Despite name suggesting "detail within order", this ID is globally unique, not just within an order. Conceptually represents the line item number, but implemented as table-wide identifier.

Use as primary key for uniqueness. Combine with salesOrderID for composite business key if needed for readability, though salesOrderDetailID alone guarantees uniqueness. No nulls, no duplicates. Sequential but not gapless - skipped numbers are normal (order cancellations, returns, or system behavior).

data_type: BIGINT


- name: carrierTrackingNumber

description: |

Shipment tracking identifier assigned by carrier for this line item. Enables shipment tracking and delivery confirmation.

Format: Standardized XX00-X000-XX pattern (e.g., "52CC-460F-B3"). ~2K distinct tracking numbers across line items. 45% null, indicating orders not yet shipped, ship methods without tracking, in-store pickup, or bundled shipments where tracking applies at order level rather than line level.

Use null vs non-null to segment shipped vs unshipped items. However, interpret carefully - null doesn't definitively mean "not shipped" as some fulfillment methods legitimately don't generate line-level tracking. For true shipment analysis, prefer sales_order_header.shipDate which provides order-level shipping status. Tracking numbers aren't strictly unique - multiple line items from same order may share tracking when shipped together.

data_type: VARCHAR


- name: orderQty

description: |

Quantity of product units ordered on this line item. Drives revenue via multiplication with unitPrice.

Highly concentrated at low quantities: 58% are qty 1, 71% are qty 1-2, 89% are qty 1-4. Long tail extends to qty 44, but quantities above 10 represent <3% of line items. Mean: 2.4, median: 1, heavily right-skewed distribution.

Represents customer purchase decision quantity. Multiply by unitPrice to calculate pre-discount subtotal, or use calculated lineTotal field which accounts for both quantity and discount. High quantities (>20) almost always involve accessories, components, or clothing - not bikes (bikes rarely exceed qty 5). For inventory and fulfillment analysis, this is units to pick/ship. For revenue analysis, this amplifies unit economics (price and discount) into total line value.

Watch out for: Outlier quantities can distort averages. When analyzing "typical" purchase behavior, consider median or restrict to qty ≤10. For total units sold analysis, SUM(orderQty) grouped appropriately. No nulls - quantity always specified.

data_type: BIGINT

options:

- value: 1

description: Single unit - dominant pattern; ~58% of line items

- value: 2

description: Two units; ~13% of line items

- value: 3

description: Three units; ~9% of line items

- value: 4

description: Four units; ~7% of line items

- value: 5

description: Five units; ~4% of line items


- name: productID

description: |

Foreign key identifying which product was sold on this line item. Links to product table for descriptive attributes, costs, and categorization.

Range: 707-999 representing 259 distinct products actively sold. Product concentration is significant: top 10 products account for 20% of line items, with product 870 alone appearing in 3.7% of line items. Distribution shows most products sell regularly (few "one-off" products), suggesting active catalog management.

Use to join to product table for product name.

...

sales_order_detail.yml

Show diff

version: 2

models:
- name: sales_order_detail
description: |

Individual line items representing products sold within each sales order.

Purpose: Line-item transaction table enabling revenue analysis, product performance tracking, discount effectiveness measurement, and basket composition analysis. Foundation for calculating revenue metrics, product-level profitability, and customer purchasing patterns. Used extensively by metrics models for calculating CLV, average order value, gross profit, and product-specific KPIs.

Contents: One row per product line item on a sales order. Composite key: (salesOrderID, salesOrderDetailID). Scale: ~121K line items across ~31K orders spanning Sept 2022 to July 2025 (date-shifted to align with current date).

Lineage: Direct pass-through from stg_sales_order_detail, which sources from sales.salesorderdetail. Staging layer calculates lineTotal field and applies date shifting to modifiedDate.

Patterns:
- Order simplicity: Most orders contain few items (avg 3.9 items per order). Single-item orders are extremely common, representing the dominant purchasing pattern.
- Quantity concentration: 58% of line items are quantity 1, 71% are quantity 1-2. Bulk purchases (qty >10) represent <3% but can reach qty 44.
- Product concentration: Top 10 products (out of 259) account for 20% of line items. Product 870 alone appears in 3.7% of all line items.
- Discount sparsity: 97% of line items have no discount (unitPriceDiscount = 0). When discounts apply, they're typically 2%, 5%, 10%, 15%, or 20%.
- Special offer dominance: 95% use specialOfferID = 1 (likely "No Discount" baseline offer), making non-promotional sales the norm.
- Carrier tracking: 45% of line items have null carrierTrackingNumber, suggesting orders not yet shipped or using ship methods without tracking.
- Price distribution: Highly skewed - median unit price $54.94, but ranges from $1.37 to $3578.27. High-value items (>$2000) appear in ~6% of line items.
- Line total pattern: Log-normal distribution with median $183.94, mean $989.34. Most line items are modest value, but tails extend to $22K+ for high-quantity luxury purchases.

Usage Guidance:
Foundational fact table for sales analytics. Essential for calculating revenue totals, analyzing product performance, measuring discount impact, and understanding purchasing behavior. Most revenue metrics aggregate lineTotal; product analysis groups by productID; discount analysis filters or segments by unitPriceDiscount or specialOfferID. For customer behavior analysis, aggregate to order level first via salesOrderID to avoid over-counting multi-item orders. For product profitability, join to product table for cost data then calculate margin (lineTotal - cost). When analyzing average order value, aggregate line items by order first to get order-level totals.

Critical Context:
- lineTotal is calculated in staging as (unitPrice * orderQty * (1 - unitPriceDiscount)) and represents net revenue after discounts but before taxes/freight. This is the primary revenue metric field.
- All dates shifted forward using shift_date() macro to make dataset feel current (max date aligns with March 28, 2025). Historical patterns span ~3 years.
- Null carrierTrackingNumber doesn't indicate data quality issue - reflects legitimate business states (orders not shipped yet, certain ship methods, or in-store pickup).
- salesOrderDetailID is unique within entire table (not just within order) - serves as primary key alone, though conceptually represents line item number within order.
- unitPrice reflects actual selling price at time of sale (may differ from product.listPrice due to negotiated pricing, promotions, or price changes over time).
- High orderQty outliers (>20) typically involve accessories or components sold in bulk, not bikes.
- No line items exist without corresponding order in sales_order_header - referential integrity is clean.


relationships:

- name: sales_order_header

description: >

Business relationship: Every line item belongs to exactly one sales order. Order header provides order-level context (customer, dates, shipping, totals, status) that applies to all line items within that order. Join to get customer attribution, order timing, territory assignment, shipping details, and order-level calculated fields (purchase context filters, consultation level, etc.).

Join considerations: Many-to-one from detail to header. Each salesOrderID in details appears in header exactly once. Each order in header typically has multiple detail rows (avg 3.9 line items per order, but distribution is right-skewed with many single-item orders).

Coverage: 100% of line items match to header. Clean referential integrity - no orphaned details.

Cardinality notes: Standard fact-to-dimension pattern. When joining, expect row count to remain same (detail-level grain preserved). When aggregating metrics from details, group by salesOrderID first to get order-level aggregates before further analysis to avoid over-representing multi-item orders.

source_col: salesOrderID

ref_col: salesOrderID

cardinality: many-to-one


- name: product

description: >

Business relationship: Each line item represents sale of one specific product. Product table provides descriptive attributes (name, category, subcategory, costs, specifications, attributes) needed for product performance analysis, profitability calculation, and inventory analysis.

Join considerations: Many-to-one from detail to product. Each productID in details matches exactly one product. Popular products appear across many line items (top product appears in 3.7% of line items; long tail products may appear only once).

Coverage: 100% of line items match to products. Only in-catalog products can be sold.

Cardinality notes: Standard fact-to-dimension pattern. Use INNER JOIN (all details have valid productID). Join preserved detail-level grain. For product-level metrics, aggregate details by productID: SUM(lineTotal) for revenue, COUNT(*) for transaction count, SUM(orderQty) for units sold, AVG(unitPrice) for typical selling price.

source_col: productID

ref_col: productID

cardinality: many-to-one


- name: special_offer

description: >

Business relationship: Each line item can apply one special offer/promotion that determines discount structure. Special offer defines discount rules (type, percentage, quantity thresholds, date ranges). 95% of line items use offer ID 1 (baseline "No Discount" offer), making promotional sales the exception rather than rule.

Join considerations: Many-to-one from detail to offer. Each specialOfferID matches exactly one offer definition. Most analysis segments by "has discount" (specialOfferID != 1) vs "no discount" rather than analyzing individual offer IDs.

Coverage: 100% of line items match to special offers (offer ID 1 always exists as default).

Cardinality notes: Heavy concentration on single offer (95% share). LEFT JOIN unnecessary since all details have valid offer, but won't hurt. For discount analysis, compare unitPriceDiscount (actual discount applied) with special_offer.discountPct (offer's defined discount) to understand discount application patterns.

source_col: specialOfferID

ref_col: specialOfferID

cardinality: many-to-one


columns:

- name: salesOrderID

description: |

Foreign key linking line item to its parent sales order in sales_order_header. Groups line items into complete orders for order-level analysis.

Every line item belongs to exactly one order. Orders typically contain multiple line items (avg 3.9 per order), though single-item orders are common. Range: 43659-68840 with ~7K distinct orders represented across 121K line items.

Use this to join to sales_order_header for order-level context (customer, dates, shipping, territory). When calculating order-level metrics from line item data, always aggregate by salesOrderID first (e.g., SUM(lineTotal) grouped by salesOrderID gives order total) before further grouping to avoid over-weighting multi-item orders. Distribution shows slight recency bias toward higher IDs (newer orders), suggesting ongoing business activity through date range.

No nulls - every detail must belong to order. Part of composite business key with salesOrderDetailID, though salesOrderDetailID alone is unique across entire table.

data_type: BIGINT


- name: salesOrderDetailID

description: |

Primary key uniquely identifying each line item across the entire table. Represents sequential line item numbering across all orders.

Range: 1-100827 with perfect uniqueness (~121K current rows, suggesting some IDs retired or skipped). Despite name suggesting "detail within order", this ID is globally unique, not just within an order. Conceptually represents the line item number, but implemented as table-wide identifier.

Use as primary key for uniqueness. Combine with salesOrderID for composite business key if needed for readability, though salesOrderDetailID alone guarantees uniqueness. No nulls, no duplicates. Sequential but not gapless - skipped numbers are normal (order cancellations, returns, or system behavior).

data_type: BIGINT


- name: carrierTrackingNumber

description: |

Shipment tracking identifier assigned by carrier for this line item. Enables shipment tracking and delivery confirmation.

Format: Standardized XX00-X000-XX pattern (e.g., "52CC-460F-B3"). ~2K distinct tracking numbers across line items. 45% null, indicating orders not yet shipped, ship methods without tracking, in-store pickup, or bundled shipments where tracking applies at order level rather than line level.

Use null vs non-null to segment shipped vs unshipped items. However, interpret carefully - null doesn't definitively mean "not shipped" as some fulfillment methods legitimately don't generate line-level tracking. For true shipment analysis, prefer sales_order_header.shipDate which provides order-level shipping status. Tracking numbers aren't strictly unique - multiple line items from same order may share tracking when shipped together.

data_type: VARCHAR


- name: orderQty

description: |

Quantity of product units ordered on this line item. Drives revenue via multiplication with unitPrice.

Highly concentrated at low quantities: 58% are qty 1, 71% are qty 1-2, 89% are qty 1-4. Long tail extends to qty 44, but quantities above 10 represent <3% of line items. Mean: 2.4, median: 1, heavily right-skewed distribution.

Represents customer purchase decision quantity. Multiply by unitPrice to calculate pre-discount subtotal, or use calculated lineTotal field which accounts for both quantity and discount. High quantities (>20) almost always involve accessories, components, or clothing - not bikes (bikes rarely exceed qty 5). For inventory and fulfillment analysis, this is units to pick/ship. For revenue analysis, this amplifies unit economics (price and discount) into total line value.

Watch out for: Outlier quantities can distort averages. When analyzing "typical" purchase behavior, consider median or restrict to qty ≤10. For total units sold analysis, SUM(orderQty) grouped appropriately. No nulls - quantity always specified.

data_type: BIGINT

options:

- value: 1

description: Single unit - dominant pattern; ~58% of line items

- value: 2

description: Two units; ~13% of line items

- value: 3

description: Three units; ~9% of line items

- value: 4

description: Four units; ~7% of line items

- value: 5

description: Five units; ~4% of line items


- name: productID

description: |

Foreign key identifying which product was sold on this line item. Links to product table for descriptive attributes, costs, and categorization.

Range: 707-999 representing 259 distinct products actively sold. Product concentration is significant: top 10 products account for 20% of line items, with product 870 alone appearing in 3.7% of line items. Distribution shows most products sell regularly (few "one-off" products), suggesting active catalog management.

Use to join to product table for product name.

...

Deep Model Understanding

Buster deploys dozens of agents in parallel to index your dbt project and explore your repo.

Grounded in Metadata

Agents specialize in retrieving and traversing dbt metadata, data profiling metrics, and lineage.

Optimized for AI tools

Agents document nuance, edge cases, and how models should actually be used in analysis.

Maintain AI context

Buster updates dbt docs on every pull request

Buster updates dbt docs on every pull request

Your dbt project is constantly changing. Buster reviews every PR and updates your docs before merging.

Reviews every PR

Buster reviews changes on every PR, traces lineage, and identifies all impacted documentation files.

Auto-update docs

Buster generates documentation updates and commits the edits directly to your branch for review.

staging-models-refactor

documentation-file-updates

Buster

bot

reviewed

4 minutes ago

Updated Project Documentation

This PR contains a few changes to various dbt models. I've updated impacted documentation accordingly:

stg_orders.yml – Renamed column reference from order_amounttotal_order_value to match the upstream model. Updated field description to clarify it’s now a post-discount amount.
dim_customers.yml – Revised the definition of “active customer” to align with new logic (based on last_purchase_date instead of is_active flag). Also refreshed example values.
analytics/orders_summary.md – Adjusted the metric definition for “Average Order Value” to reflect the renamed column and excluded test orders from the aggregation.
schema_overview.md – Updated cross-references and lineage links to ensure all documentation points to the new field names.

During the review, I also noted that a few upstream staging models had inconsistencies in naming conventions and description formats. The affected models have been standardized to follow the repository’s current dbt style guide, ensuring consistent YAML structure and metadata completeness. Where appropriate, I added missing descriptions for source columns and normalized tag usage across the analytics layer. These changes should make the DAG easier to navigate and improve the readability of generated documentation in dbt Docs.

These edits bring the documentation in sync with the latest model logic and ensure semantic definitions stay accurate across layers.

Ensure data quality

Buster runs scheduled audits on your dbt project and ensures data quality

Buster runs scheduled audits on your dbt project and ensures data quality

Audit your dbt project on a recurring schedule

Buster profiles your dbt models on a scheduled cadence—logging schemas, distributions, and more.

Runs

run_jzdeqacokljk4ioyuxqjvmkzrf

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_mnaxvqzjkbhs8fmobgxlqhjzrt

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_ynhwertghjkf67asdlkfjhqw

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_cmgvazqbgrh443aoiuoqxjkjh

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_bjwnxfqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_jfakdbaqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

daily-project-audit

2:36 AM

Daily Audit: 3 anomalies detected

Three anomalies were flagged in this run: a 38% drop in sales_orders volume, a new specialOfferID in sales_items, and a weaker correlation between orderQty and lineTotal hinting at pricing drift. The agent grouped these under one “sales data consistency” incident, opened a PR to adjust thresholds, and sent a slack alert flagging the weak correlation.

staging

sales-test-thresholds

Audit your dbt project on a recurring schedule

Buster profiles your dbt models on a scheduled cadence—logging schemas, distributions, and more.

Runs

run_jzdeqacokljk4ioyuxqjvmkzrf

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_mnaxvqzjkbhs8fmobgxlqhjzrt

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_ynhwertghjkf67asdlkfjhqw

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_cmgvazqbgrh443aoiuoqxjkjh

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_bjwnxfqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_jfakdbaqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

daily-project-audit

2:36 AM

Daily Audit: 3 anomalies detected

Three anomalies were flagged in this run: a 38% drop in sales_orders volume, a new specialOfferID in sales_items, and a weaker correlation between orderQty and lineTotal hinting at pricing drift. The agent grouped these under one “sales data consistency” incident, opened a PR to adjust thresholds, and sent a slack alert flagging the weak correlation.

staging

sales-test-thresholds

Audit your dbt project on a recurring schedule

Buster profiles your dbt models on a scheduled cadence—logging schemas, distributions, and more.

Runs

run_jzdeqacokljk4ioyuxqjvmkzrf

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_mnaxvqzjkbhs8fmobgxlqhjzrt

daily-project-audit

Flagged for review

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_ynhwertghjkf67asdlkfjhqw

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_cmgvazqbgrh443aoiuoqxjkjh

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_bjwnxfqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

run_jfakdbaqhlpdt2focvwefklkqz

daily-project-audit

No issues detected

pr_checks.yml

Oct 21, 2025, 4:00 PM

4m, 5s

daily-project-audit

2:36 AM

Daily Audit: 3 anomalies detected

Three anomalies were flagged in this run: a 38% drop in sales_orders volume, a new specialOfferID in sales_items, and a weaker correlation between orderQty and lineTotal hinting at pricing drift. The agent grouped these under one “sales data consistency” incident, opened a PR to adjust thresholds, and sent a slack alert flagging the weak correlation.

staging

sales-test-thresholds

Receive alerts & suggested fixes

Buster automatically detects drift, anomalies, and more. When data quality issues arise, Buster sends Slack alerts or opens a PR with fixes.

Possible pricing drift detected in sales_items

now

Correlation between orderQty and lineTotal fell from 0.91 → 0.63, indicating a potential change in pricing logic. The agent flagged it for review.

Power all your AI tools

Buster improves downstream AI tools

Buster improves downstream AI tools

Buster optimizes your dbt docs for any AI or BI tool that uses your data models for analysis or data engineering.

dbt Docs

Managed by Buster

AI Context Improves All of your AI tools

Sales Rep Performance Analysis

why do top sales reps perform better than bottom reps

Reasoned for 3 minutes

Sales Rep Analysis - Top vs Bottom Performers

Version 1

I created a report to assess your top vs bottom performing sales reps.

The clearest differentiator between the two was targeting daily cyclists instead of less frequent cyclists. Top performers consistently prioritize this "daily cyclists" segment and daily cyclists spend 2.3x more than less frequent cyclists.

If you see something interesting, let me know. I can drill down or explore whatever might be meaningful.

Follow up...

Our AI may make mistakes. Check important info.

Report

File

Sales Rep Analysis - Top vs Bottom Performers

Aug 11, 2025

Created by Buster

Top quartile reps generate $17.3M annually vs bottom quartile at $5.9M - a $11.4M performance difference. Targeting daily cyclists instead of less frequent cyclists appears to be the clearest differentiator between top-performing and bottom-performing reps. Some key findings are:

  • Daily Cycling" customers represent a $114,391 average annual value vs $46,564-$59,198 for other segments (like hobbyists)

  • Top performers capture 51% of this daily cyclist segment vs 27.5% for bottom performers

  • Top performers achieve 75%+ revenue from existing customers

Top Performers Close 3.5x More Revenue

2024 performance data shows a dramatic gap between our top and bottom sales representatives. Linda Mitchell leads with $4.23M in revenue, while Lynn Tsoflias generated only $1.19M—a 3.5X difference.

Total Sales by Sales Rep

Last 12 months

What were total sales for each sales reps from the last 12 months?

Top Performers Have a Higher Customer Count

Top quartile reps serve an average of 52.5 customers annually while bottom quartile representatives serve an average of 22.8 customers annually. Top performers have a 130% larger customer base compared to bottom performers, with a 950% difference between Jillian Carson (highest, 57 customers) and Pamela Ansman-Wolfe (lowest, 6 customers).

Customer Count by Sales Rep

Last 12 months

How many unique customers did each sales rep close?

Top Performers Focus on Existing Customers

Top quartile reps derive 74.5-78.4% of revenue from existing customers, while bottom quartile reps show significant variance (0-74% existing customer focus). A focus on existing customers does correlate with sustained high performance. This suggests that relationship building and account expansion are more profitable than constant prospecting.

New vs Existing Customer Orders by Sales Rep

Last 12 months

What percentage of orders from the last 12 months were with new vs existing customers, broken down by sales rep?

Top Performers Focus on "Daily Cyclists"

Customers recorded as "Daily" cyclists represent the premium market segment, have a higher AOV, and make more frequent purchases. 51% of customers that have purchased from top performers are daily cyclists, vs only 27.5% from bottom performers. This is a 328% differential in segment capture.

% of Total Sales by Product Category

Last 12 months

What is the % breakdown of sales by product category for each sales representative?

"Daily Cyclists" Spend 2.3x More

There is a $50,477 premium when selling to daily cyclists. Daily cyclists spend over 2.3x more than the rolling average ($88,758 vs $38,281). The ratio difference between daily and occasional cyclists is even more pronounced (272x more). Top performers strategically focus on the daily cyclist segment because of this.

Average Annual Spend by Cycling Frequency

Last 12 months

What is the average annual spend of customers grouped by their cycling frequency?

Conclusion

Top-performing sales reps outperform their bottom-performing counterparts by strategically targeting high-value "Daily Cyclist" customers, who spend 2.3x more than the average, and by focusing on existing customer relationships, which drive 74.5-78.4% of their revenue. This dual focus on premium segments and account expansion results in a 2.9x revenue gap and a 130% larger customer base compared to bottom performers, who capture only 27.5% of daily cyclists and show inconsistent prioritization of existing customers.

Sales Rep Performance Analysis

why do top sales reps perform better than bottom reps

Reasoned for 3 minutes

Sales Rep Analysis - Top vs Bottom Performers

Version 1

I created a report to assess your top vs bottom performing sales reps.

The clearest differentiator between the two was targeting daily cyclists instead of less frequent cyclists. Top performers consistently prioritize this "daily cyclists" segment and daily cyclists spend 2.3x more than less frequent cyclists.

If you see something interesting, let me know. I can drill down or explore whatever might be meaningful.

Follow up...

Our AI may make mistakes. Check important info.

Report

File

Sales Rep Analysis - Top vs Bottom Performers

Aug 11, 2025

Created by Buster

Top quartile reps generate $17.3M annually vs bottom quartile at $5.9M - a $11.4M performance difference. Targeting daily cyclists instead of less frequent cyclists appears to be the clearest differentiator between top-performing and bottom-performing reps. Some key findings are:

  • Daily Cycling" customers represent a $114,391 average annual value vs $46,564-$59,198 for other segments (like hobbyists)

  • Top performers capture 51% of this daily cyclist segment vs 27.5% for bottom performers

  • Top performers achieve 75%+ revenue from existing customers

Top Performers Close 3.5x More Revenue

2024 performance data shows a dramatic gap between our top and bottom sales representatives. Linda Mitchell leads with $4.23M in revenue, while Lynn Tsoflias generated only $1.19M—a 3.5X difference.

Total Sales by Sales Rep

Last 12 months

What were total sales for each sales reps from the last 12 months?

Top Performers Have a Higher Customer Count

Top quartile reps serve an average of 52.5 customers annually while bottom quartile representatives serve an average of 22.8 customers annually. Top performers have a 130% larger customer base compared to bottom performers, with a 950% difference between Jillian Carson (highest, 57 customers) and Pamela Ansman-Wolfe (lowest, 6 customers).

Customer Count by Sales Rep

Last 12 months

How many unique customers did each sales rep close?

Top Performers Focus on Existing Customers

Top quartile reps derive 74.5-78.4% of revenue from existing customers, while bottom quartile reps show significant variance (0-74% existing customer focus). A focus on existing customers does correlate with sustained high performance. This suggests that relationship building and account expansion are more profitable than constant prospecting.

New vs Existing Customer Orders by Sales Rep

Last 12 months

What percentage of orders from the last 12 months were with new vs existing customers, broken down by sales rep?

Top Performers Focus on "Daily Cyclists"

Customers recorded as "Daily" cyclists represent the premium market segment, have a higher AOV, and make more frequent purchases. 51% of customers that have purchased from top performers are daily cyclists, vs only 27.5% from bottom performers. This is a 328% differential in segment capture.

% of Total Sales by Product Category

Last 12 months

What is the % breakdown of sales by product category for each sales representative?

"Daily Cyclists" Spend 2.3x More

There is a $50,477 premium when selling to daily cyclists. Daily cyclists spend over 2.3x more than the rolling average ($88,758 vs $38,281). The ratio difference between daily and occasional cyclists is even more pronounced (272x more). Top performers strategically focus on the daily cyclist segment because of this.

Average Annual Spend by Cycling Frequency

Last 12 months

What is the average annual spend of customers grouped by their cycling frequency?

Conclusion

Top-performing sales reps outperform their bottom-performing counterparts by strategically targeting high-value "Daily Cyclist" customers, who spend 2.3x more than the average, and by focusing on existing customer relationships, which drive 74.5-78.4% of their revenue. This dual focus on premium segments and account expansion results in a 2.9x revenue gap and a 130% larger customer base compared to bottom performers, who capture only 27.5% of daily cyclists and show inconsistent prioritization of existing customers.

Use Buster’s native AI data analyst

For teams that want an end-to-end solution, Buster includes a best-in-class AI analyst for self-serve.

Sales Over Recent Months

update me on our sales over recent months

Reasoned for 6 minutes

Sales Over Recent Months

Version 1

I spent a few minutes digging into sale data from recent months and spun up this report for you.

It looks like sales dipped last month, likely as the result of a 28% decrease in marketing spend. This especially impacted sales in the Electronics Category.

Ask a follow up...

Our AI may make mistakes. Check important info.

Report

File

Sales Over Recent Months

Apr 17, 2025

Created by Buster

Last month, sales experienced a significant decline - dropping nearly 21% compared to the previous month. This report investigates the reasons behind this decline using historical sales data, marketing spend, and competitor activity.

Sales Decline in Electronics Category

Last month's sales fell nearly 21% below the previous month, with a significant 67.42% drop in the electronics category compared to the previous month.

Monthly Total Sales and Monthly Electronics Sales

Last 6 months

What were total sales and electronics sales over the last 6 months?

Total Sales

Electronics Sales

Impact of Reduced Marketing Spend

Marketing spend decreased by 28% last month. Regression analysis indicates a strong historic correlation (R² = 0.78) between your marketing spend and sales, suggesting this reduction significantly contributed to the sales dip.

Marketing Spend & Electronics Sales

Last 6 months

What was marketing spend and electronics sales over the last 6 months?

Marketing Spend

Electronics Sales

Customers

Real results from modern data teams

Real results from modern data teams

16.5x increase in self-served data requests
4x fewer breaking changes in prod
3x faster PR cycles

"Buster immediately improved the performance in our downstream self-serve AI tools."

Brett Denney

Data Analytics, Remi

"A lot of data engineers think self serve is a myth. This is actually self serve, for real for real."

Alex Ahlstrom

Director of Analytics, Angel Studios

"Buster frees me up from the ad-hoc tasks I always had to do so I can focus on longer term goals."

Landen Bailey

Senior Data Engineer, Redo

Enterprise & security

Buster is built with enterprise-grade security practices. This includes state-of-the-art encryption, safe and reliable infrastructure partners, and independently verified security controls.

SOC 2 Type II compliant

Buster has undergone a Service Organization Controls audit (SOC 2 Type II).

HIPAA compliant

Privacy & security measures to ensure that PHI is appropriately safeguarded.

Permissions & governance

Provision users, enforce permissions, & implement robust governance.

IP protection policy

Neither Buster nor our model partners train models on customer data.

Self-hosted deployment

Deploy in your own air-gapped environment.

Secure connections

SSL and pass-through OAuth available.

Connect your dbt project and get comprehensive docs in just 5 minutes

Connect your dbt project and get comprehensive docs in just 5 minutes