sales_order_detail.yml (Claude Code vs Buster)

Collapse diff

Copied

Copy file

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

version: 2


models:

- name: sales_order_detail

description: |

Contains details for each sales order line item.














columns:

- name: salesOrderID

description: |

The ID of the sales order.











- name: salesOrderDetailID

description: |

The ID of the sales order details.








- name: carrierTrackingNumber

description: |

The tracking number for the shipment.







- name: orderQty

description: |

The quantity of the product ordered.













- name: productID

description: |

The ID of the product.









- name: specialOfferID

description: |

The ID of the special offer.











- name: unitPrice

description: |

The price per unit.














- name: unitPriceDiscount

description: |

The discount applied to the unit price.














- name: lineTotal

description: |

The total amount for the line item.













- name: rowguid

description: |

A unique identifier for the row.



- name: modifiedDate

description: |

The date the record was last modified.





version: 2


models:

- name: sales_order_detail

description: |

Line item detail for sales orders, representing individual products purchased within each order.

Essential for product-level sales analysis, inventory tracking, and revenue attribution by SKU.

Lineage: sourced from `stg_sales_order_detail` which pulls from source sales.salesorderdetail;

applies lineTotal calculation (unitPrice * orderQty * (1 - unitPriceDiscount)).

Patterns: ~121k rows across ~31k distinct orders (~3.9 items per order average);

null rate on carrierTrackingNumber is 46% (metadata as of 2025-10-10);

62% of line items have orderQty = 1, with long-tail up to 32 units;

95% of lines use specialOfferID = 1 (standard pricing);

266 distinct products; unitPrice heavily right-skewed (mean $485, median $55, 99th percentile $3,578).

Watchouts: carrierTrackingNumber absent for ~46% of rows (non-shipped or bulk orders);

unitPriceDiscount is 0 for 97% of rows; specialOfferID values beyond 1-2 represent promotional pricing rarely applied;

lineTotal is pre-calculated in staging (not enforced by model grain).

Freshness/Scale: spans 2022-09-10 to 2025-10-10; daily refreshes.


columns:

- name: salesOrderID

description: |

Foreign key to sales_order_header (order entity).

How to use it: Join to sales_order_header for customer, territory, and order-level aggregates.

Data characteristics: ~31k distinct orders; typical order contains 1-5 line items (mean ~3.9).

Related columns: salesOrderDetailID is unique within this order context.

Watch out for: Not unique at line item grain; must combine with salesOrderDetailID for uniqueness.

tests:

- not_null

- relationships:

to: ref('sales_order_header')

field: salesOrderID


- name: salesOrderDetailID

description: |

Primary key; unique identifier for each line item.

How to use it: Use as the grain anchor for joins and deduplication.

Data characteristics: 121,317 distinct values = 100% unique (metadata as of 2025-10-10).

Watch out for: This is the sole PK; salesOrderID alone is not unique.

tests:

- unique

- not_null


- name: carrierTrackingNumber

description: |

Shipping carrier tracking number for the line item.

How to use it: Filter for shipped items; link to logistics/fulfillment systems.

Data characteristics: Null rate 46%; ~2,214 distinct values when present;

alphanumeric format (e.g., "1534-4AB5-81").

Patterns & Insights: Absence indicates non-shipped orders (e.g., digital, in-store pickup, or bulk direct-ship).

Watch out for: Do not assume presence; left join or COALESCE when computing ship metrics.


- name: orderQty

description: |

Quantity of the product ordered on this line item.

How to use it: SUM for volume analysis; group by for distribution insights.

Data characteristics: 62% of lines have qty = 1;

95th percentile = 7; max = 32; mean ~2.4 (metadata as of 2025-10-10).

Patterns & Insights: Long-tail distribution reflects bulk orders (likely B2B or restocking);

individual consumer orders cluster at 1-3 units.

Watch out for: Aggregations should weight by orderQty to avoid unit vs line-item confusion.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 1

max_value: 100


- name: productID

description: |

Foreign key to product table.

How to use it: Join to product dimensions for category, name, and attributes.

Data characteristics: 266 distinct products; top product (870) appears in 3.4% of lines;

moderate concentration (Gini ~0.52).

Patterns & Insights: Distribution suggests a core set of popular SKUs with long-tail specialty items.

Watch out for: Ensure product dimension is current to avoid orphaned productIDs.

tests:

- not_null


- name: specialOfferID

description: |

Foreign key to special offer/promotion applied to this line.

How to use it: Filter for promotional analysis; join to special_offer for campaign details.

Data characteristics: 95.5% are specialOfferID = 1 (standard pricing);

2.8% are ID = 2; remaining <2% distributed across IDs 3-16 (metadata as of 2025-10-10).

Patterns & Insights: Promotions are rare and targeted; standard pricing dominates.

Watch out for: Do not assume promo when unitPriceDiscount > 0; discount logic is independent.

tests:

- not_null

- accepted_values:

values: [1, 2, 3, 4, 5, 7, 8, 9, 11, 13, 14, 16]


- name: unitPrice

description: |

Price per unit in USD before discount.

Calculation: Pulled from product pricing at order time (historical snapshot).

Interpretation: Median $55, mean $485 (right-skewed);

99th percentile $3,578; max ~$3,578 (metadata as of 2025-10-10).

Aggregation guidance: Use lineTotal for revenue; unitPrice * orderQty for pre-discount;

winsorize or filter outliers for AOV analysis if high-value SKUs dominate.

Data notes: Reflects pricing at order date; does not update with catalog changes.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: unitPriceDiscount

description: |

Discount rate applied to unitPrice (decimal; 0 = no discount, 0.15 = 15% off).

Calculation: Stored as decimal proportion (not percentage).

Interpretation: 97.1% of lines have 0 discount; 1.1% have 2% discount;

remaining <2% have 5-35% discount (metadata as of 2025-10-10).

Patterns & Insights: Discounts are applied selectively; most transactions are full-price.

Watch out for: Do not confuse with specialOfferID; discount may apply independently of offer campaigns.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

max_value: 1

meta:

unit: proportion


- name: lineTotal

description: |

Line-level revenue in USD.

Calculation: unitPrice * orderQty * (1 - unitPriceDiscount); pre-calculated in stg_sales_order_detail.

Interpretation: Median $187, mean $974 (right-skewed);

99th percentile ~$8,588; max ~$27,894 (metadata as of 2025-10-10).

Aggregation guidance: SUM for revenue rollups; winsorize top 1% if outliers distort segment analysis.

Data notes: Excludes tax and shipping; reflects net product revenue only.

tests:

- not_null

- dbt_utils.accepted_range:

min_value: 0

meta:

unit: USD


- name: rowguid

description: |

System-generated GUID for internal ETL tracking.

Watch out for: Not meaningful for business analysis; do not use as a join key or filter.


- name: modifiedDate

description: |

Timestamp of last update to the record.

How to use it: Track data freshness; filter for incremental loads.

Data characteristics: Spans 2022-09-10 to 2025-10-10; ~723 distinct timestamps;

clustered around daily batch times (6am/7am).

Watch out for: Reflects ETL refresh, not order date; use salesOrderID join for order context.