sales_order_detail.yml (Claude Code vs Buster)
Collapse diff
Copied
Copy file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
version: 2
models:
- name: sales_order_detail
description: |
Contains details for each sales order line item.
columns:
- name: salesOrderID
description: |
The ID of the sales order.
- name: salesOrderDetailID
description: |
The ID of the sales order details.
- name: carrierTrackingNumber
description: |
The tracking number for the shipment.
- name: orderQty
description: |
The quantity of the product ordered.
- name: productID
description: |
The ID of the product.
- name: specialOfferID
description: |
The ID of the special offer.
- name: unitPrice
description: |
The price per unit.
- name: unitPriceDiscount
description: |
The discount applied to the unit price.
- name: lineTotal
description: |
The total amount for the line item.
- name: rowguid
description: |
A unique identifier for the row.
- name: modifiedDate
description: |
The date the record was last modified.
version: 2
models:
- name: sales_order_detail
description: |
Line item detail for sales orders, representing individual products purchased within each order.
Essential for product-level sales analysis, inventory tracking, and revenue attribution by SKU.
Lineage: sourced from `stg_sales_order_detail` which pulls from source sales.salesorderdetail;
applies lineTotal calculation (unitPrice * orderQty * (1 - unitPriceDiscount)).
Patterns: ~121k rows across ~31k distinct orders (~3.9 items per order average);
null rate on carrierTrackingNumber is 46% (metadata as of 2025-10-10);
62% of line items have orderQty = 1, with long-tail up to 32 units;
95% of lines use specialOfferID = 1 (standard pricing);
266 distinct products; unitPrice heavily right-skewed (mean $485, median $55, 99th percentile $3,578).
Watchouts: carrierTrackingNumber absent for ~46% of rows (non-shipped or bulk orders);
unitPriceDiscount is 0 for 97% of rows; specialOfferID values beyond 1-2 represent promotional pricing rarely applied;
lineTotal is pre-calculated in staging (not enforced by model grain).
Freshness/Scale: spans 2022-09-10 to 2025-10-10; daily refreshes.
columns:
- name: salesOrderID
description: |
Foreign key to sales_order_header (order entity).
How to use it: Join to sales_order_header for customer, territory, and order-level aggregates.
Data characteristics: ~31k distinct orders; typical order contains 1-5 line items (mean ~3.9).
Related columns: salesOrderDetailID is unique within this order context.
Watch out for: Not unique at line item grain; must combine with salesOrderDetailID for uniqueness.
tests:
- not_null
- relationships:
to: ref('sales_order_header')
field: salesOrderID
- name: salesOrderDetailID
description: |
Primary key; unique identifier for each line item.
How to use it: Use as the grain anchor for joins and deduplication.
Data characteristics: 121,317 distinct values = 100% unique (metadata as of 2025-10-10).
Watch out for: This is the sole PK; salesOrderID alone is not unique.
tests:
- unique
- not_null
- name: carrierTrackingNumber
description: |
Shipping carrier tracking number for the line item.
How to use it: Filter for shipped items; link to logistics/fulfillment systems.
Data characteristics: Null rate 46%; ~2,214 distinct values when present;
alphanumeric format (e.g., "1534-4AB5-81").
Patterns & Insights: Absence indicates non-shipped orders (e.g., digital, in-store pickup, or bulk direct-ship).
Watch out for: Do not assume presence; left join or COALESCE when computing ship metrics.
- name: orderQty
description: |
Quantity of the product ordered on this line item.
How to use it: SUM for volume analysis; group by for distribution insights.
Data characteristics: 62% of lines have qty = 1;
95th percentile = 7; max = 32; mean ~2.4 (metadata as of 2025-10-10).
Patterns & Insights: Long-tail distribution reflects bulk orders (likely B2B or restocking);
individual consumer orders cluster at 1-3 units.
Watch out for: Aggregations should weight by orderQty to avoid unit vs line-item confusion.
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 1
max_value: 100
- name: productID
description: |
Foreign key to product table.
How to use it: Join to product dimensions for category, name, and attributes.
Data characteristics: 266 distinct products; top product (870) appears in 3.4% of lines;
moderate concentration (Gini ~0.52).
Patterns & Insights: Distribution suggests a core set of popular SKUs with long-tail specialty items.
Watch out for: Ensure product dimension is current to avoid orphaned productIDs.
tests:
- not_null
- name: specialOfferID
description: |
Foreign key to special offer/promotion applied to this line.
How to use it: Filter for promotional analysis; join to special_offer for campaign details.
Data characteristics: 95.5% are specialOfferID = 1 (standard pricing);
2.8% are ID = 2; remaining <2% distributed across IDs 3-16 (metadata as of 2025-10-10).
Patterns & Insights: Promotions are rare and targeted; standard pricing dominates.
Watch out for: Do not assume promo when unitPriceDiscount > 0; discount logic is independent.
tests:
- not_null
- accepted_values:
values: [1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 16]
- name: unitPrice
description: |
Price per unit in USD before discount.
Calculation: Pulled from product pricing at order time (historical snapshot).
Interpretation: Median $55, mean $485 (right-skewed);
99th percentile $3,578; max ~$3,578 (metadata as of 2025-10-10).
Aggregation guidance: Use lineTotal for revenue; unitPrice * orderQty for pre-discount;
winsorize or filter outliers for AOV analysis if high-value SKUs dominate.
Data notes: Reflects pricing at order date; does not update with catalog changes.
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
meta:
unit: USD
- name: unitPriceDiscount
description: |
Discount rate applied to unitPrice (decimal; 0 = no discount, 0.15 = 15% off).
Calculation: Stored as decimal proportion (not percentage).
Interpretation: 97.1% of lines have 0 discount; 1.1% have 2% discount;
remaining <2% have 5-35% discount (metadata as of 2025-10-10).
Patterns & Insights: Discounts are applied selectively; most transactions are full-price.
Watch out for: Do not confuse with specialOfferID; discount may apply independently of offer campaigns.
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
max_value: 1
meta:
unit: proportion
- name: lineTotal
description: |
Line-level revenue in USD.
Calculation: unitPrice * orderQty * (1 - unitPriceDiscount); pre-calculated in stg_sales_order_detail.
Interpretation: Median $187, mean $974 (right-skewed);
99th percentile ~$8,588; max ~$27,894 (metadata as of 2025-10-10).
Aggregation guidance: SUM for revenue rollups; winsorize top 1% if outliers distort segment analysis.
Data notes: Excludes tax and shipping; reflects net product revenue only.
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
meta:
unit: USD
- name: rowguid
description: |
System-generated GUID for internal ETL tracking.
Watch out for: Not meaningful for business analysis; do not use as a join key or filter.
- name: modifiedDate
description: |
Timestamp of last update to the record.
How to use it: Track data freshness; filter for incremental loads.
Data characteristics: Spans 2022-09-10 to 2025-10-10; ~723 distinct timestamps;
clustered around daily batch times (6am/7am).
Watch out for: Reflects ETL refresh, not order date; use salesOrderID join for order context.