05 · Data Modeling: Slowly Changing Dimensions, Conformed Dimensions

Level: Intermediate to Advanced Time to read: 22 min Pre-reading: 02 · Warehouse Architecture After reading: You'll understand SCD types, when to use each, and how conformed dimensions maintain consistency across the warehouse.


The Problem: Dimensions Change Over Time

Dimension: dim_product
Product 789: "Laptop Pro" → Category: "Electronics"

6 months later:
Product 789: "Laptop Pro" → Category: "Computer" (marketing decides to reorg)

Question: When analyzing sales by product category in Q1:
 - Include "Laptop Pro" as "Electronics"?
 - Include "Laptop Pro" as "Computer"?
 - Show both versions?

This is the Slowly Changing Dimension (SCD) problem.


Approach: Overwrite all values. Never keep history.

UPDATE dim_product 
SET category = 'Computer' 
WHERE product_id = 789;

-- Old value "Electronics" is lost forever

Pros: Simple, minimal storage
Cons: Can't analyze historical behavior, fact tables don't match dimensions

Used for: Dimensions where changes are irrelevant (e.g., product description typo fixes)


SCD Type 1: Overwrite (Overwrite History)

Approach: Update dimension attributes in-place. Old value is lost.

-- Initial insert
INSERT INTO dim_product VALUES (789, 'Laptop Pro', 'Electronics');

-- 6 months later: Category changes
UPDATE dim_product 
SET category = 'Computer' 
WHERE product_id = 789;

-- Query on old fact table:
SELECT SUM(sales), category
FROM fact_sales f
JOIN dim_product d ON f.product_key = d.product_id
WHERE f.sale_date IN ('2024-01-15', '2024-07-15')
GROUP BY category;

-- Both sales show category = 'Computer' (inaccurate for Jan sales!)

Pros: Simplest, minimal storage, easy to maintain
Cons: Historical accuracy lost, can't analyze under old dimension values

Used for: Corrections/typos, truly immutable attributes (biz doesn't care about history)

Example: Fixing a product name typo


SCD Type 2: Track All Versions (Gold Standard)

Approach: Keep multiple rows per dimension entity, each representing a version with effective/end dates.

-- Initial
INSERT INTO dim_product VALUES 
  (1, 789, 'Laptop Pro', 'Electronics', 
   '2024-01-01', NULL, TRUE);
   -- (surrogate_key, business_key, name, category, eff_date, end_date, is_current)

-- 6 months later: Category changes
UPDATE dim_product 
SET end_date = '2024-07-01', is_current = FALSE
WHERE product_id = 789 AND is_current = TRUE;

INSERT INTO dim_product VALUES 
  (2, 789, 'Laptop Pro', 'Computer', 
   '2024-07-01', NULL, TRUE);

-- Now query:
SELECT SUM(f.sales), d.category
FROM fact_sales f
JOIN dim_product d ON f.product_key = d.surrogate_key
WHERE f.sale_date IN ('2024-01-15', '2024-07-15')
GROUP BY d.category, f.sale_date;

-- 2024-01-15: Electronics
-- 2024-07-15: Computer
-- ✅ Historically accurate!

SCD Type 2 Implementation

MERGE INTO dim_product t
USING (
  SELECT 
    product_id,          -- Business key
    product_name,
    category,
    current_timestamp() as load_time
  FROM silver.products
) s
ON t.product_id = s.product_id AND t.is_current = TRUE
WHEN MATCHED AND (
  t.category != s.category OR 
  t.product_name != s.product_name
) THEN
  -- Close old record
  UPDATE SET 
    is_current = FALSE,
    end_date = s.load_time
WHEN NOT MATCHED THEN
  -- Insert new record
  INSERT (surrogate_key, product_id, product_name, category, 
          effective_date, end_date, is_current)
  VALUES (
    NEXT_VALUE(seq_product_key),
    s.product_id,
    s.product_name,
    s.category,
    s.load_time,
    NULL,
    TRUE
  );

Pros: ✅ Full history preserved, historically accurate facts
Cons: ❌ Storage overhead (multiple rows per dimension), complexity

Used for: Almost all dimensions in data warehouses (best practice)

Example: Customer segment changes, product category reorganization


SCD Type 3: Previous Version Only

Approach: Keep current and previous versions in same row (extra columns).

INSERT INTO dim_product VALUES 
  (789, 'Laptop Pro', 'Electronics', NULL);

-- 6 months later
UPDATE dim_product 
SET 
  previous_category = category,
  category = 'Computer'
WHERE product_id = 789;

-- Result
SELECT * FROM dim_product WHERE product_id = 789;
-- (789, 'Laptop Pro', 'Computer', 'Electronics')

Pros: Simpler than Type 2, only 1 row per entity
Cons: Can only track 2 versions (current + previous), not full history

Used for: Rarely (prefer Type 2)

Example: Product price: current vs. previous month


SCD Type 4: History Tables (Versioning)

Approach: Keep main dimension (Type 1), maintain separate history table.

-- Main dimension (current only, Type 1)
CREATE TABLE dim_product (
  product_id INT PRIMARY KEY,
  product_name VARCHAR,
  category VARCHAR
);

-- History table (full audit trail)
CREATE TABLE dim_product_history (
  product_id INT,
  product_name VARCHAR,
  category VARCHAR,
  effective_date DATE,
  end_date DATE,
  change_reason VARCHAR
);

-- Fast queries use main table (Type 1)
SELECT SUM(sales), category
FROM fact_sales f
JOIN dim_product d ON f.product_id = d.product_id
GROUP BY category;

-- Compliance/audit queries use history
SELECT * FROM dim_product_history 
WHERE product_id = 789
ORDER BY effective_date;

Pros: Fast main dimension, audit history available
Cons: Two tables to maintain, complex logic

Used for: Compliance-heavy organizations (audit trail required)


SCD Type 5: Hybrid (Type 1 + Type 2)

Approach: SCD Type 2 (track changes) + denormalized current values for fast queries.

CREATE TABLE dim_product (
  surrogate_key INT,
  product_id INT,

  -- Type 2: History tracking
  category VARCHAR,
  effective_date DATE,
  end_date DATE,
  is_current BOOLEAN,

  -- Type 1: Current denormalization
  current_category VARCHAR (updated daily)
);

Pros: Historical accuracy + fast current queries
Cons: Extra column, more maintenance

Used for: Large fact tables where filtering on current dimension is critical


SCD Type 6: Hybrid (Type 1 + Type 2 + Type 3)

Combines all approaches: current attributes (Type 1), full history (Type 2), and previous value (Type 3) for quick comparisons.

Rarely used (overkill complexity).


Which SCD Type to Use?

Dimension SCD Type Why
Date None Time doesn't change
Customer location 2 Need historical accuracy (customer moved)
Product category 2 Business reorganizes categories frequently
Product price 2 or 3 Track price changes over time
Store name 1 Corrections only, business doesn't care about history
Slowly changing attribute 2 Default choice

Default answer in interviews: SCD Type 2 (if you need history).


Conformed Dimensions: Integration Across Fact Tables

The Problem: Multiple Dimensions

Sales fact table: 
  dimension: Customer (id, name, segment, country)

Support fact table:
  dimension: Customer (id, name, title, support_tier)

Question: What's the customer's segment?
  Sales dimension says: "Premium"
  Support dimension says: ???

Problem: Inconsistent customer definition across warehouse!

Solution: Single Conformed Dimension

graph TB
    FS["fact_sales"]
    FSup["fact_support_tickets"]
    FC["gold.dim_customer<br/>(SINGLE<br/>CONFORMED<br/>DIMENSION)"]

    FS -->|customer_key| FC
    FSup -->|customer_key| FC
-- Single source of truth for customers
CREATE TABLE gold.dim_customer (
  customer_key INT PRIMARY KEY,
  customer_id INT UNIQUE,

  -- Attributes from Sales process
  segment VARCHAR,         -- From sales CRM
  country VARCHAR,         -- From sales CRM

  -- Attributes from Support process
  support_tier VARCHAR,    -- From support system

  -- Attributes from Marketing
  email_segment VARCHAR,   -- From email marketing

  -- Tracking
  effective_date DATE,
  end_date DATE,
  is_current BOOLEAN
);

-- All fact tables use this dimension
CREATE TABLE gold.fact_sales (
  customer_key INT REFERENCES gold.dim_customer,
  product_key INT,
  date_key INT,
  amount DECIMAL
);

CREATE TABLE gold.fact_support_tickets (
  customer_key INT REFERENCES gold.dim_customer,
  support_agent_key INT,
  date_key INT,
  resolution_time_minutes INT
);

Building Conformed Dimensions

-- Start from Silver (cleaned single-source data)
CREATE TABLE gold.dim_customer AS

-- Main source: Sales
SELECT 
  ROW_NUMBER() OVER (ORDER BY customer_id) as customer_key,
  sc.customer_id,
  sc.name,
  sc.segment,
  sc.country,

  -- Join with Support data
  sup.support_tier,

  -- Join with Marketing data
  mk.email_segment,

  current_date() as effective_date,
  NULL as end_date,
  TRUE as is_current

FROM silver.customers sc
LEFT JOIN silver.support_tiers sup 
  ON sc.customer_id = sup.customer_id
LEFT JOIN silver.marketing_segments mk 
  ON sc.customer_id = mk.customer_id;

Benefits of Conformation

Single customer definition across warehouse
Consistent BI reports (every report sees same segment)
Simplified analysis (no reconciliation needed)
Governance (one place to audit customer attributes)


SCD Type 2 + Conformed Dimensions = Best Practice

Combining SCD Type 2 with conformed dimensions is the gold standard:

-- Conformed, historicized customer dimension
MERGE INTO gold.dim_customer t
USING (
  SELECT 
    sc.customer_id,
    sc.name,
    sc.segment,      -- Updates tracked via SCD Type 2
    sc.country,
    sup.support_tier,
    mk.email_segment,
    current_timestamp() as load_time
  FROM silver.customers sc
  LEFT JOIN silver.support_tiers sup ON sc.customer_id = sup.customer_id
  LEFT JOIN silver.marketing_segments mk ON sc.customer_id = mk.customer_id
) s
ON t.customer_id = s.customer_id AND t.is_current = TRUE
WHEN MATCHED AND (
  t.segment != s.segment OR 
  t.support_tier != s.support_tier OR
  t.email_segment != s.email_segment
) THEN UPDATE SET 
  is_current = FALSE,
  end_date = s.load_time
WHEN NOT MATCHED THEN INSERT *;

Result: Historical customer dimension, integrated from multiple sources, with full audit trail.


Deep-Dives

Deep Dive: Dimensional Modeling — Advanced SCD patterns, bridge tables, junk dimensions.

Deep Dive: Schema Design — Designing conformed dimensions for complex enterprises.


Key Takeaways

SCD Type 2 is the industry standard: track all versions with date ranges.

SCD Type 1 only for corrections or truly immutable attributes.

Conformed dimensions ensure consistent definitions across all fact tables.

History tracking enables historically accurate fact analysis (critical for financial reporting).

Surrogate keys enable SCD Type 2 without disrupting fact tables.


Practice Questions

  1. Why is SCD Type 2 better than Type 1 for customer segment?
  2. How does SCD Type 2 handle the "changing dimension" problem?
  3. What's a conformed dimension and why do you need only one?
  4. Design a conformed customer dimension integrating CRM, support, and marketing systems.
  5. How would you query a fact table with an SCD Type 2 dimension to get historical accuracy?