3.03 · Deep Dive: Gold Layer - Business Analytics & Reporting

Level: Intermediate to Advanced Time to read: 18 min Pre-reading: 03 · Medallion Architecture · 3.02 · Silver Layer After reading: You'll understand how to build the Gold layer: conformed dimensions, fact tables, and aggregations that power business analytics and dashboards.

Gold Layer: Purpose & Design

The Gold layer applies business logic and builds dimensional models optimized for analytics, reporting, and machine learning.

Characteristics

Aspect	Detail
Data Quality	SLA guaranteed (99.95%+ accuracy)
Schema	Dimensional model (Star/Snowflake schema)
Transformations	Business logic, aggregations, conformed dimensions
Update Strategy	SCD Type 1 (overwrite) or Type 2 (track versions)
Ownership	Analytics engineering team
Consumer Grade	For business analysts, BI tools, executives

Gold Layer Architecture

graph LR
    S["🥈 Silver<br/>(Validated)"]
    BL["Business<br/>Logic"]
    CD["Conformed<br/>Dimensions"]
    FF["Fact<br/>Tables"]
    AGG["Aggregations"]
    G["🥇 Gold<br/>(Business-Ready)"]

    S -->|Define measures| BL
    BL -->|Deduped refs| CD
    CD -->|Join dimensions| FF
    FF -->|Pre-aggregate| AGG
    CD --> G
    FF --> G
    AGG --> G

Gold Components

1. Conformed Dimensions

Definition: Reusable dimension tables that maintain consistency across multiple fact tables.

Example: Conformed Customer Dimension

CREATE OR REPLACE TABLE gold.dim_customer AS
SELECT
  -- Surrogate key (for performance)
  GENERATE_UNIQUE_ID() as customer_key,

  -- Business key (for traceability)
  c.customer_id,

  -- Customer attributes
  c.first_name,
  c.last_name,
  CONCAT(c.first_name, ' ', c.last_name) as full_name,
  c.email,
  c.phone,

  -- Derived attributes (business logic)
  LOWER(REGEXP_EXTRACT(c.email, r'^[^@]+')) as email_prefix,
  EXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.birth_date) as age,

  -- SCD Type 2 tracking
  c.effective_date,
  c.end_date,
  c.is_current,

  -- Reference to geography
  g.country_key,
  g.country_name,
  g.region_name,

  -- Customer segment (from ML or business rules)
  s.customer_segment,
  s.lifetime_value_bucket,

  -- Metadata
  CURRENT_TIMESTAMP() as _dbt_updated_at,
  CURRENT_DATE() as _dbt_load_date
FROM silver.customers c
LEFT JOIN gold.dim_geography g
  ON c.country_code = g.country_code
LEFT JOIN gold.dim_customer_segment s
  ON c.customer_id = s.customer_id
  AND CURRENT_DATE() BETWEEN s.effective_date AND COALESCE(s.end_date, '2099-12-31')
WHERE c.is_current = TRUE;

2. Fact Tables

Definition: Measures (metrics) at a grain (level of detail) with foreign keys to dimensions.

Example: Sales Fact Table

CREATE OR REPLACE TABLE gold.fact_sales AS
SELECT
  -- Grain: One row per order line item per transaction

  -- Fact keys (foreign keys to dimensions)
  dk.customer_key,
  dp.product_key,
  ds.store_key,
  dd.date_key,

  -- Degenerate dimensions (attributes specific to transaction)
  o.order_id,
  o.line_item_number,

  -- Measures (numeric facts)
  o.quantity,
  o.unit_price,
  o.unit_cost,
  (o.quantity * o.unit_price) as extended_price,
  (o.quantity * o.unit_cost) as extended_cost,
  (o.quantity * (o.unit_price - o.unit_cost)) as gross_profit,

  -- Derived measures
  CASE
    WHEN o.discount_percent > 0.2 THEN 'HIGH'
    WHEN o.discount_percent > 0.1 THEN 'MEDIUM'
    ELSE 'LOW'
  END as discount_tier,

  -- Metadata
  CURRENT_TIMESTAMP() as _dbt_updated_at
FROM silver.orders o
LEFT JOIN gold.dim_customer dk ON o.customer_id = dk.customer_id
LEFT JOIN gold.dim_product dp ON o.product_id = dp.product_id
LEFT JOIN gold.dim_store ds ON o.store_id = ds.store_id
LEFT JOIN gold.dim_date dd ON DATE(o.order_date) = dd.date_actual
WHERE o._is_valid = TRUE;

3. Aggregated Tables (Pre-Aggregations)

Definition: Pre-computed aggregations to power dashboards without requiring real-time computation.

Example: Daily Revenue Summary

CREATE OR REPLACE TABLE gold.agg_daily_revenue AS
SELECT
  dd.date_key,
  dd.date_actual,
  dd.year,
  dd.month,
  dd.day_of_month,

  dp.product_key,
  dp.product_id,
  dp.product_name,
  dp.category,

  ds.store_key,
  ds.store_id,
  ds.store_name,
  ds.region,

  -- Aggregated measures
  COUNT(DISTINCT f.order_id) as num_orders,
  SUM(f.quantity) as total_quantity,
  SUM(f.extended_price) as total_revenue,
  SUM(f.extended_cost) as total_cost,
  SUM(f.gross_profit) as total_profit,

  -- Derived metrics
  SAFE_DIVIDE(SUM(f.gross_profit), SUM(f.extended_price)) as profit_margin,
  SAFE_DIVIDE(SUM(f.extended_price), COUNT(DISTINCT f.order_id)) as avg_order_value,

  CURRENT_TIMESTAMP() as _dbt_updated_at
FROM gold.fact_sales f
LEFT JOIN gold.dim_date dd ON f.date_key = dd.date_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN gold.dim_store ds ON f.store_key = ds.store_key
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13;

Gold Layer Design Patterns

Pattern 1: Star Schema (Recommended)

Fact table (center) connected to dimension tables (points)

        ┌─────────────┐
        │ dim_customer│
        └──────┬──────┘
               │
    ┌──────────┼──────────┐
    │          │          │
┌───┴───┐  ┌──▼───┐  ┌──┴────┐
│dim_   │  │fact_ │  │dim_   │
│date   │──┤sales ├──┤product│
└───┬───┘  └──▲───┘  └──┬────┘
    │         │         │
    └─────────┼─────────┘
          ┌──┴────┐
          │dim_   │
          │store  │
          └───────┘

Advantages:

Simple to query (fewer joins)
Faster analytical queries
Easy to understand for business users

Disadvantages:

Potential for data redundancy
Dimension updates can be complex

Pattern 2: Snowflake Schema (Normalized)

Dimensions are further normalized

         ┌─────────────────┐
         │ dim_customer    │
         └────────┬────────┘
                  │
    ┌─────────────┼──────────────┐
    │             │              │
┌───┴────┐   ┌───▼──┐   ┌──────┴──┐
│dim_date │   │fact_ │   │dim_prod │
│         │───┤sales ├───┤         │
└────┬────┘   └──▲───┘   └────┬────┘
     │           │            │
     │      ┌────┴─────┐      │
     └──────┤ dim_     │      │
            │ store    │      │
            └──────────┘      │
                              │
                         ┌────┴─────┐
                         │dim_       │
                         │category   │
                         └───────────┘

Advantages:

Less redundancy (normalized)
Easier dimension maintenance
Reduced storage

Disadvantages:

More complex queries (more joins)
Slower analytical queries

Pattern 3: Data Vault (Complex but Flexible)

Hub-Link-Satellite pattern for maximum flexibility

┌────────────┐
│ Hub_       │ (Business keys)
│ Customer   │
└────────────┘
      │
      ├──────→ Link_Customer_Order
      │        ├──→ Sat_Customer
      │        └──→ Sat_Order
      │
      └──────→ Link_Customer_Geography
               └──→ Sat_Geography

Business Logic & Transformations

Pattern 1: Calculated Columns

-- Add business-derived columns
CREATE OR REPLACE TABLE gold.fact_sales_enhanced AS
SELECT
  f.*,

  -- Customer lifetime value tier (business logic)
  CASE
    WHEN clv.lifetime_value > 100000 THEN 'VIP'
    WHEN clv.lifetime_value > 50000 THEN 'HIGH_VALUE'
    WHEN clv.lifetime_value > 10000 THEN 'MEDIUM_VALUE'
    ELSE 'LOW_VALUE'
  END as customer_value_tier,

  -- Seasonality flag (business logic)
  CASE
    WHEN EXTRACT(MONTH FROM f.order_date) IN (11, 12) THEN 'HOLIDAY'
    WHEN EXTRACT(MONTH FROM f.order_date) IN (6, 7, 8) THEN 'SUMMER'
    ELSE 'REGULAR'
  END as seasonality,

  -- Product affinity (business logic)
  CASE
    WHEN dp.category IN ('Electronics', 'Computers') THEN 'TECH_AFFINITY'
    WHEN dp.category IN ('Books', 'Media') THEN 'CONTENT_AFFINITY'
    ELSE 'GENERAL'
  END as affinity_group
FROM gold.fact_sales f
LEFT JOIN gold.dim_customer dk ON f.customer_key = dk.customer_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN customer_lifetime_values clv ON dk.customer_id = clv.customer_id;

Pattern 2: Slowly Changing Dimensions (SCD Type 2)

-- Track dimension changes over time
CREATE OR REPLACE TABLE gold.dim_product_scd2 AS
SELECT
  ROW_NUMBER() OVER (ORDER BY sp.product_id, sp.effective_date) as product_key,
  sp.product_id,
  sp.product_name,
  sp.category,
  sp.price,
  sp.effective_date,
  sp.end_date,
  CASE WHEN sp.end_date IS NULL THEN TRUE ELSE FALSE END as is_current,
  ROW_NUMBER() OVER (PARTITION BY sp.product_id ORDER BY sp.effective_date) as version_number
FROM silver.products sp
WHERE sp.is_current = TRUE OR sp.end_date > CURRENT_DATE() - INTERVAL 90 DAY;

Gold Layer Testing

dbt Tests for Gold Layer

# dbt/models/gold/gold_fact_sales.yml
version: 2

models:
  - name: gold_fact_sales
    description: "Fact table of sales transactions with SLA: 99.95% accuracy, loaded by 6 AM daily"

    tests:
      - dbt_utils.recency:
          datepart: day
          interval: 1
          field: _dbt_updated_at

      - dbt_utils.equality:
          compare_model: ref('expected_fact_sales')
          tolerance_percent: 0.05

    columns:
      - name: order_id
        tests:
          - not_null
          - unique

      - name: customer_key
        tests:
          - not_null
          - relationships:
              to: ref('gold_dim_customer')
              field: customer_key

      - name: extended_price
        tests:
          - not_null
          - positive_values:
              column_name: extended_price

      - name: gross_profit
        tests:
          - expression_is_true:
              expression: 'gross_profit <= extended_price'

Gold Layer Performance Optimization

Materialization Strategies

Strategy	Use Case	Performance	Cost
View	Simple pass-through logic	Real-time, variable	Pay-per-query
Table	Complex transformations	Fast reads, slow refreshes	Storage + compute
Incremental	Append-only or slowly changing	Fast incremental runs	Storage + compute
Snapshot	Dimension tracking	Point-in-time accuracy	Storage overhead

Example: Incremental Materialization

-- dbt: Incremental refresh of fact table
{{
  config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='fail',
    partition_by={
      'field': 'order_date',
      'data_type': 'date',
      'granularity': 'day'
    },
    cluster_by=['customer_key', 'product_key']
  )
}}

SELECT
  o.order_id,
  o.customer_id,
  o.product_id,
  o.order_date,
  ...
FROM {{ ref('silver_orders') }} o
{% if execute and execute_macros %}
  WHERE o.order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}

Gold Layer Best Practices

✅ Design for query patterns

Understand how business users will query the data
Pre-aggregate if needed
Denormalize if it improves performance

✅ Maintain conformed dimensions

Reuse dimensions across fact tables
Single version of truth for each entity
Version dimensions when business logic changes

✅ Document transformations

Explain why business logic was chosen
Document SCD strategies
Record changes to definitions

✅ Guarantee SLAs

Data freshness (loaded by specific time)
Data accuracy (99.95%+ pass rate)
Query performance (< 5 seconds for standard queries)

❌ Don't put raw data in Gold

Always use Silver as source
Never join directly to Bronze

❌ Don't create redundant dimensions

Use conformed dimensions
Avoid 10 different customer dimensions

Gold vs. Silver vs. Bronze

Aspect	Bronze	Silver	Gold
Purpose	Raw ingest	Validated source of truth	Business analytics
Update Strategy	Append-only	SCD Type 2	SCD Type 1/2
Schema	Source schema	Technical schema	Business schema
Data Quality	Unknown	Certified	SLA guaranteed
Consumers	DataOps team	Analytics engineers	Business analysts
Retention	1-2 years	1+ years	3+ years

Key Takeaways

Gold layer applies business logic and creates analytical models
Star schema is most common (dimensional model)
Conformed dimensions maintain consistency across models
Aggregations pre-compute common metrics for performance
SCD Type 2 preserves history for accurate analysis
Testing ensures quality at analytical level

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search