05 · Data Modeling: Slowly Changing Dimensions, Conformed Dimensions
Level: Intermediate to Advanced Time to read: 22 min Pre-reading: 02 · Warehouse Architecture After reading: You'll understand SCD types, when to use each, and how conformed dimensions maintain consistency across the warehouse.
The Problem: Dimensions Change Over Time
Dimension: dim_product
Product 789: "Laptop Pro" → Category: "Electronics"
6 months later:
Product 789: "Laptop Pro" → Category: "Computer" (marketing decides to reorg)
Question: When analyzing sales by product category in Q1:
- Include "Laptop Pro" as "Electronics"?
- Include "Laptop Pro" as "Computer"?
- Show both versions?
This is the Slowly Changing Dimension (SCD) problem.
SCD Type 0: No History (Not Recommended)
Approach: Overwrite all values. Never keep history.
UPDATE dim_product
SET category = 'Computer'
WHERE product_id = 789;
-- Old value "Electronics" is lost forever
Pros: Simple, minimal storage
Cons: Can't analyze historical behavior, fact tables don't match dimensions
Used for: Dimensions where changes are irrelevant (e.g., product description typo fixes)
SCD Type 1: Overwrite (Overwrite History)
Approach: Update dimension attributes in-place. Old value is lost.
-- Initial insert
INSERT INTO dim_product VALUES (789, 'Laptop Pro', 'Electronics');
-- 6 months later: Category changes
UPDATE dim_product
SET category = 'Computer'
WHERE product_id = 789;
-- Query on old fact table:
SELECT SUM(sales), category
FROM fact_sales f
JOIN dim_product d ON f.product_key = d.product_id
WHERE f.sale_date IN ('2024-01-15', '2024-07-15')
GROUP BY category;
-- Both sales show category = 'Computer' (inaccurate for Jan sales!)
Pros: Simplest, minimal storage, easy to maintain
Cons: Historical accuracy lost, can't analyze under old dimension values
Used for: Corrections/typos, truly immutable attributes (biz doesn't care about history)
Example: Fixing a product name typo
SCD Type 2: Track All Versions (Gold Standard)
Approach: Keep multiple rows per dimension entity, each representing a version with effective/end dates.
-- Initial
INSERT INTO dim_product VALUES
(1, 789, 'Laptop Pro', 'Electronics',
'2024-01-01', NULL, TRUE);
-- (surrogate_key, business_key, name, category, eff_date, end_date, is_current)
-- 6 months later: Category changes
UPDATE dim_product
SET end_date = '2024-07-01', is_current = FALSE
WHERE product_id = 789 AND is_current = TRUE;
INSERT INTO dim_product VALUES
(2, 789, 'Laptop Pro', 'Computer',
'2024-07-01', NULL, TRUE);
-- Now query:
SELECT SUM(f.sales), d.category
FROM fact_sales f
JOIN dim_product d ON f.product_key = d.surrogate_key
WHERE f.sale_date IN ('2024-01-15', '2024-07-15')
GROUP BY d.category, f.sale_date;
-- 2024-01-15: Electronics
-- 2024-07-15: Computer
-- ✅ Historically accurate!
SCD Type 2 Implementation
MERGE INTO dim_product t
USING (
SELECT
product_id, -- Business key
product_name,
category,
current_timestamp() as load_time
FROM silver.products
) s
ON t.product_id = s.product_id AND t.is_current = TRUE
WHEN MATCHED AND (
t.category != s.category OR
t.product_name != s.product_name
) THEN
-- Close old record
UPDATE SET
is_current = FALSE,
end_date = s.load_time
WHEN NOT MATCHED THEN
-- Insert new record
INSERT (surrogate_key, product_id, product_name, category,
effective_date, end_date, is_current)
VALUES (
NEXT_VALUE(seq_product_key),
s.product_id,
s.product_name,
s.category,
s.load_time,
NULL,
TRUE
);
Pros: ✅ Full history preserved, historically accurate facts
Cons: ❌ Storage overhead (multiple rows per dimension), complexity
Used for: Almost all dimensions in data warehouses (best practice)
Example: Customer segment changes, product category reorganization
SCD Type 3: Previous Version Only
Approach: Keep current and previous versions in same row (extra columns).
INSERT INTO dim_product VALUES
(789, 'Laptop Pro', 'Electronics', NULL);
-- 6 months later
UPDATE dim_product
SET
previous_category = category,
category = 'Computer'
WHERE product_id = 789;
-- Result
SELECT * FROM dim_product WHERE product_id = 789;
-- (789, 'Laptop Pro', 'Computer', 'Electronics')
Pros: Simpler than Type 2, only 1 row per entity
Cons: Can only track 2 versions (current + previous), not full history
Used for: Rarely (prefer Type 2)
Example: Product price: current vs. previous month
SCD Type 4: History Tables (Versioning)
Approach: Keep main dimension (Type 1), maintain separate history table.
-- Main dimension (current only, Type 1)
CREATE TABLE dim_product (
product_id INT PRIMARY KEY,
product_name VARCHAR,
category VARCHAR
);
-- History table (full audit trail)
CREATE TABLE dim_product_history (
product_id INT,
product_name VARCHAR,
category VARCHAR,
effective_date DATE,
end_date DATE,
change_reason VARCHAR
);
-- Fast queries use main table (Type 1)
SELECT SUM(sales), category
FROM fact_sales f
JOIN dim_product d ON f.product_id = d.product_id
GROUP BY category;
-- Compliance/audit queries use history
SELECT * FROM dim_product_history
WHERE product_id = 789
ORDER BY effective_date;
Pros: Fast main dimension, audit history available
Cons: Two tables to maintain, complex logic
Used for: Compliance-heavy organizations (audit trail required)
SCD Type 5: Hybrid (Type 1 + Type 2)
Approach: SCD Type 2 (track changes) + denormalized current values for fast queries.
CREATE TABLE dim_product (
surrogate_key INT,
product_id INT,
-- Type 2: History tracking
category VARCHAR,
effective_date DATE,
end_date DATE,
is_current BOOLEAN,
-- Type 1: Current denormalization
current_category VARCHAR (updated daily)
);
Pros: Historical accuracy + fast current queries
Cons: Extra column, more maintenance
Used for: Large fact tables where filtering on current dimension is critical
SCD Type 6: Hybrid (Type 1 + Type 2 + Type 3)
Combines all approaches: current attributes (Type 1), full history (Type 2), and previous value (Type 3) for quick comparisons.
Rarely used (overkill complexity).
Which SCD Type to Use?
| Dimension | SCD Type | Why |
|---|---|---|
| Date | None | Time doesn't change |
| Customer location | 2 | Need historical accuracy (customer moved) |
| Product category | 2 | Business reorganizes categories frequently |
| Product price | 2 or 3 | Track price changes over time |
| Store name | 1 | Corrections only, business doesn't care about history |
| Slowly changing attribute | 2 | Default choice |
Default answer in interviews: SCD Type 2 (if you need history).
Conformed Dimensions: Integration Across Fact Tables
The Problem: Multiple Dimensions
Sales fact table:
dimension: Customer (id, name, segment, country)
Support fact table:
dimension: Customer (id, name, title, support_tier)
Question: What's the customer's segment?
Sales dimension says: "Premium"
Support dimension says: ???
Problem: Inconsistent customer definition across warehouse!
Solution: Single Conformed Dimension
graph TB
FS["fact_sales"]
FSup["fact_support_tickets"]
FC["gold.dim_customer<br/>(SINGLE<br/>CONFORMED<br/>DIMENSION)"]
FS -->|customer_key| FC
FSup -->|customer_key| FC
-- Single source of truth for customers
CREATE TABLE gold.dim_customer (
customer_key INT PRIMARY KEY,
customer_id INT UNIQUE,
-- Attributes from Sales process
segment VARCHAR, -- From sales CRM
country VARCHAR, -- From sales CRM
-- Attributes from Support process
support_tier VARCHAR, -- From support system
-- Attributes from Marketing
email_segment VARCHAR, -- From email marketing
-- Tracking
effective_date DATE,
end_date DATE,
is_current BOOLEAN
);
-- All fact tables use this dimension
CREATE TABLE gold.fact_sales (
customer_key INT REFERENCES gold.dim_customer,
product_key INT,
date_key INT,
amount DECIMAL
);
CREATE TABLE gold.fact_support_tickets (
customer_key INT REFERENCES gold.dim_customer,
support_agent_key INT,
date_key INT,
resolution_time_minutes INT
);
Building Conformed Dimensions
-- Start from Silver (cleaned single-source data)
CREATE TABLE gold.dim_customer AS
-- Main source: Sales
SELECT
ROW_NUMBER() OVER (ORDER BY customer_id) as customer_key,
sc.customer_id,
sc.name,
sc.segment,
sc.country,
-- Join with Support data
sup.support_tier,
-- Join with Marketing data
mk.email_segment,
current_date() as effective_date,
NULL as end_date,
TRUE as is_current
FROM silver.customers sc
LEFT JOIN silver.support_tiers sup
ON sc.customer_id = sup.customer_id
LEFT JOIN silver.marketing_segments mk
ON sc.customer_id = mk.customer_id;
Benefits of Conformation
✅ Single customer definition across warehouse
✅ Consistent BI reports (every report sees same segment)
✅ Simplified analysis (no reconciliation needed)
✅ Governance (one place to audit customer attributes)
SCD Type 2 + Conformed Dimensions = Best Practice
Combining SCD Type 2 with conformed dimensions is the gold standard:
-- Conformed, historicized customer dimension
MERGE INTO gold.dim_customer t
USING (
SELECT
sc.customer_id,
sc.name,
sc.segment, -- Updates tracked via SCD Type 2
sc.country,
sup.support_tier,
mk.email_segment,
current_timestamp() as load_time
FROM silver.customers sc
LEFT JOIN silver.support_tiers sup ON sc.customer_id = sup.customer_id
LEFT JOIN silver.marketing_segments mk ON sc.customer_id = mk.customer_id
) s
ON t.customer_id = s.customer_id AND t.is_current = TRUE
WHEN MATCHED AND (
t.segment != s.segment OR
t.support_tier != s.support_tier OR
t.email_segment != s.email_segment
) THEN UPDATE SET
is_current = FALSE,
end_date = s.load_time
WHEN NOT MATCHED THEN INSERT *;
Result: Historical customer dimension, integrated from multiple sources, with full audit trail.
Deep-Dives
→ Deep Dive: Dimensional Modeling — Advanced SCD patterns, bridge tables, junk dimensions.
→ Deep Dive: Schema Design — Designing conformed dimensions for complex enterprises.
Key Takeaways
✅ SCD Type 2 is the industry standard: track all versions with date ranges.
✅ SCD Type 1 only for corrections or truly immutable attributes.
✅ Conformed dimensions ensure consistent definitions across all fact tables.
✅ History tracking enables historically accurate fact analysis (critical for financial reporting).
✅ Surrogate keys enable SCD Type 2 without disrupting fact tables.
Practice Questions
- Why is SCD Type 2 better than Type 1 for customer segment?
- How does SCD Type 2 handle the "changing dimension" problem?
- What's a conformed dimension and why do you need only one?
- Design a conformed customer dimension integrating CRM, support, and marketing systems.
- How would you query a fact table with an SCD Type 2 dimension to get historical accuracy?