3.03 · Deep Dive: Gold Layer - Business Analytics & Reporting

Level: Intermediate to Advanced Time to read: 18 min Pre-reading: 03 · Medallion Architecture · 3.02 · Silver Layer After reading: You'll understand how to build the Gold layer: conformed dimensions, fact tables, and aggregations that power business analytics and dashboards.


Gold Layer: Purpose & Design

The Gold layer applies business logic and builds dimensional models optimized for analytics, reporting, and machine learning.

Characteristics

Aspect Detail
Data Quality SLA guaranteed (99.95%+ accuracy)
Schema Dimensional model (Star/Snowflake schema)
Transformations Business logic, aggregations, conformed dimensions
Update Strategy SCD Type 1 (overwrite) or Type 2 (track versions)
Ownership Analytics engineering team
Consumer Grade For business analysts, BI tools, executives

Gold Layer Architecture

graph LR
    S["🥈 Silver<br/>(Validated)"]
    BL["Business<br/>Logic"]
    CD["Conformed<br/>Dimensions"]
    FF["Fact<br/>Tables"]
    AGG["Aggregations"]
    G["🥇 Gold<br/>(Business-Ready)"]

    S -->|Define measures| BL
    BL -->|Deduped refs| CD
    CD -->|Join dimensions| FF
    FF -->|Pre-aggregate| AGG
    CD --> G
    FF --> G
    AGG --> G

Gold Components

1. Conformed Dimensions

Definition: Reusable dimension tables that maintain consistency across multiple fact tables.

Example: Conformed Customer Dimension

CREATE OR REPLACE TABLE gold.dim_customer AS
SELECT
  -- Surrogate key (for performance)
  GENERATE_UNIQUE_ID() as customer_key,

  -- Business key (for traceability)
  c.customer_id,

  -- Customer attributes
  c.first_name,
  c.last_name,
  CONCAT(c.first_name, ' ', c.last_name) as full_name,
  c.email,
  c.phone,

  -- Derived attributes (business logic)
  LOWER(REGEXP_EXTRACT(c.email, r'^[^@]+')) as email_prefix,
  EXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.birth_date) as age,

  -- SCD Type 2 tracking
  c.effective_date,
  c.end_date,
  c.is_current,

  -- Reference to geography
  g.country_key,
  g.country_name,
  g.region_name,

  -- Customer segment (from ML or business rules)
  s.customer_segment,
  s.lifetime_value_bucket,

  -- Metadata
  CURRENT_TIMESTAMP() as _dbt_updated_at,
  CURRENT_DATE() as _dbt_load_date
FROM silver.customers c
LEFT JOIN gold.dim_geography g
  ON c.country_code = g.country_code
LEFT JOIN gold.dim_customer_segment s
  ON c.customer_id = s.customer_id
  AND CURRENT_DATE() BETWEEN s.effective_date AND COALESCE(s.end_date, '2099-12-31')
WHERE c.is_current = TRUE;

2. Fact Tables

Definition: Measures (metrics) at a grain (level of detail) with foreign keys to dimensions.

Example: Sales Fact Table

CREATE OR REPLACE TABLE gold.fact_sales AS
SELECT
  -- Grain: One row per order line item per transaction

  -- Fact keys (foreign keys to dimensions)
  dk.customer_key,
  dp.product_key,
  ds.store_key,
  dd.date_key,

  -- Degenerate dimensions (attributes specific to transaction)
  o.order_id,
  o.line_item_number,

  -- Measures (numeric facts)
  o.quantity,
  o.unit_price,
  o.unit_cost,
  (o.quantity * o.unit_price) as extended_price,
  (o.quantity * o.unit_cost) as extended_cost,
  (o.quantity * (o.unit_price - o.unit_cost)) as gross_profit,

  -- Derived measures
  CASE
    WHEN o.discount_percent > 0.2 THEN 'HIGH'
    WHEN o.discount_percent > 0.1 THEN 'MEDIUM'
    ELSE 'LOW'
  END as discount_tier,

  -- Metadata
  CURRENT_TIMESTAMP() as _dbt_updated_at
FROM silver.orders o
LEFT JOIN gold.dim_customer dk ON o.customer_id = dk.customer_id
LEFT JOIN gold.dim_product dp ON o.product_id = dp.product_id
LEFT JOIN gold.dim_store ds ON o.store_id = ds.store_id
LEFT JOIN gold.dim_date dd ON DATE(o.order_date) = dd.date_actual
WHERE o._is_valid = TRUE;

3. Aggregated Tables (Pre-Aggregations)

Definition: Pre-computed aggregations to power dashboards without requiring real-time computation.

Example: Daily Revenue Summary

CREATE OR REPLACE TABLE gold.agg_daily_revenue AS
SELECT
  dd.date_key,
  dd.date_actual,
  dd.year,
  dd.month,
  dd.day_of_month,

  dp.product_key,
  dp.product_id,
  dp.product_name,
  dp.category,

  ds.store_key,
  ds.store_id,
  ds.store_name,
  ds.region,

  -- Aggregated measures
  COUNT(DISTINCT f.order_id) as num_orders,
  SUM(f.quantity) as total_quantity,
  SUM(f.extended_price) as total_revenue,
  SUM(f.extended_cost) as total_cost,
  SUM(f.gross_profit) as total_profit,

  -- Derived metrics
  SAFE_DIVIDE(SUM(f.gross_profit), SUM(f.extended_price)) as profit_margin,
  SAFE_DIVIDE(SUM(f.extended_price), COUNT(DISTINCT f.order_id)) as avg_order_value,

  CURRENT_TIMESTAMP() as _dbt_updated_at
FROM gold.fact_sales f
LEFT JOIN gold.dim_date dd ON f.date_key = dd.date_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN gold.dim_store ds ON f.store_key = ds.store_key
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13;

Gold Layer Design Patterns

Fact table (center) connected to dimension tables (points)

        ┌─────────────┐
        │ dim_customer│
        └──────┬──────┘
    ┌──────────┼──────────┐
    │          │          │
┌───┴───┐  ┌──▼───┐  ┌──┴────┐
│dim_   │  │fact_ │  │dim_   │
│date   │──┤sales ├──┤product│
└───┬───┘  └──▲───┘  └──┬────┘
    │         │         │
    └─────────┼─────────┘
          ┌──┴────┐
          │dim_   │
          │store  │
          └───────┘

Advantages:

  • Simple to query (fewer joins)
  • Faster analytical queries
  • Easy to understand for business users

Disadvantages:

  • Potential for data redundancy
  • Dimension updates can be complex

Pattern 2: Snowflake Schema (Normalized)

Dimensions are further normalized

         ┌─────────────────┐
         │ dim_customer    │
         └────────┬────────┘
    ┌─────────────┼──────────────┐
    │             │              │
┌───┴────┐   ┌───▼──┐   ┌──────┴──┐
│dim_date │   │fact_ │   │dim_prod │
│         │───┤sales ├───┤         │
└────┬────┘   └──▲───┘   └────┬────┘
     │           │            │
     │      ┌────┴─────┐      │
     └──────┤ dim_     │      │
            │ store    │      │
            └──────────┘      │
                         ┌────┴─────┐
                         │dim_       │
                         │category   │
                         └───────────┘

Advantages:

  • Less redundancy (normalized)
  • Easier dimension maintenance
  • Reduced storage

Disadvantages:

  • More complex queries (more joins)
  • Slower analytical queries

Pattern 3: Data Vault (Complex but Flexible)

Hub-Link-Satellite pattern for maximum flexibility

┌────────────┐
│ Hub_       │ (Business keys)
│ Customer   │
└────────────┘
      ├──────→ Link_Customer_Order
      │        ├──→ Sat_Customer
      │        └──→ Sat_Order
      └──────→ Link_Customer_Geography
               └──→ Sat_Geography

Business Logic & Transformations

Pattern 1: Calculated Columns

-- Add business-derived columns
CREATE OR REPLACE TABLE gold.fact_sales_enhanced AS
SELECT
  f.*,

  -- Customer lifetime value tier (business logic)
  CASE
    WHEN clv.lifetime_value > 100000 THEN 'VIP'
    WHEN clv.lifetime_value > 50000 THEN 'HIGH_VALUE'
    WHEN clv.lifetime_value > 10000 THEN 'MEDIUM_VALUE'
    ELSE 'LOW_VALUE'
  END as customer_value_tier,

  -- Seasonality flag (business logic)
  CASE
    WHEN EXTRACT(MONTH FROM f.order_date) IN (11, 12) THEN 'HOLIDAY'
    WHEN EXTRACT(MONTH FROM f.order_date) IN (6, 7, 8) THEN 'SUMMER'
    ELSE 'REGULAR'
  END as seasonality,

  -- Product affinity (business logic)
  CASE
    WHEN dp.category IN ('Electronics', 'Computers') THEN 'TECH_AFFINITY'
    WHEN dp.category IN ('Books', 'Media') THEN 'CONTENT_AFFINITY'
    ELSE 'GENERAL'
  END as affinity_group
FROM gold.fact_sales f
LEFT JOIN gold.dim_customer dk ON f.customer_key = dk.customer_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN customer_lifetime_values clv ON dk.customer_id = clv.customer_id;

Pattern 2: Slowly Changing Dimensions (SCD Type 2)

-- Track dimension changes over time
CREATE OR REPLACE TABLE gold.dim_product_scd2 AS
SELECT
  ROW_NUMBER() OVER (ORDER BY sp.product_id, sp.effective_date) as product_key,
  sp.product_id,
  sp.product_name,
  sp.category,
  sp.price,
  sp.effective_date,
  sp.end_date,
  CASE WHEN sp.end_date IS NULL THEN TRUE ELSE FALSE END as is_current,
  ROW_NUMBER() OVER (PARTITION BY sp.product_id ORDER BY sp.effective_date) as version_number
FROM silver.products sp
WHERE sp.is_current = TRUE OR sp.end_date > CURRENT_DATE() - INTERVAL 90 DAY;

Gold Layer Testing

dbt Tests for Gold Layer

# dbt/models/gold/gold_fact_sales.yml
version: 2

models:
  - name: gold_fact_sales
    description: "Fact table of sales transactions with SLA: 99.95% accuracy, loaded by 6 AM daily"

    tests:
      - dbt_utils.recency:
          datepart: day
          interval: 1
          field: _dbt_updated_at

      - dbt_utils.equality:
          compare_model: ref('expected_fact_sales')
          tolerance_percent: 0.05

    columns:
      - name: order_id
        tests:
          - not_null
          - unique

      - name: customer_key
        tests:
          - not_null
          - relationships:
              to: ref('gold_dim_customer')
              field: customer_key

      - name: extended_price
        tests:
          - not_null
          - positive_values:
              column_name: extended_price

      - name: gross_profit
        tests:
          - expression_is_true:
              expression: 'gross_profit <= extended_price'

Gold Layer Performance Optimization

Materialization Strategies

Strategy Use Case Performance Cost
View Simple pass-through logic Real-time, variable Pay-per-query
Table Complex transformations Fast reads, slow refreshes Storage + compute
Incremental Append-only or slowly changing Fast incremental runs Storage + compute
Snapshot Dimension tracking Point-in-time accuracy Storage overhead

Example: Incremental Materialization

-- dbt: Incremental refresh of fact table
{{
  config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='fail',
    partition_by={
      'field': 'order_date',
      'data_type': 'date',
      'granularity': 'day'
    },
    cluster_by=['customer_key', 'product_key']
  )
}}

SELECT
  o.order_id,
  o.customer_id,
  o.product_id,
  o.order_date,
  ...
FROM {{ ref('silver_orders') }} o
{% if execute and execute_macros %}
  WHERE o.order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}

Gold Layer Best Practices

Design for query patterns

  • Understand how business users will query the data
  • Pre-aggregate if needed
  • Denormalize if it improves performance

Maintain conformed dimensions

  • Reuse dimensions across fact tables
  • Single version of truth for each entity
  • Version dimensions when business logic changes

Document transformations

  • Explain why business logic was chosen
  • Document SCD strategies
  • Record changes to definitions

Guarantee SLAs

  • Data freshness (loaded by specific time)
  • Data accuracy (99.95%+ pass rate)
  • Query performance (< 5 seconds for standard queries)

Don't put raw data in Gold

  • Always use Silver as source
  • Never join directly to Bronze

Don't create redundant dimensions

  • Use conformed dimensions
  • Avoid 10 different customer dimensions

Gold vs. Silver vs. Bronze

Aspect Bronze Silver Gold
Purpose Raw ingest Validated source of truth Business analytics
Update Strategy Append-only SCD Type 2 SCD Type 1/2
Schema Source schema Technical schema Business schema
Data Quality Unknown Certified SLA guaranteed
Consumers DataOps team Analytics engineers Business analysts
Retention 1-2 years 1+ years 3+ years

Key Takeaways

  1. Gold layer applies business logic and creates analytical models
  2. Star schema is most common (dimensional model)
  3. Conformed dimensions maintain consistency across models
  4. Aggregations pre-compute common metrics for performance
  5. SCD Type 2 preserves history for accurate analysis
  6. Testing ensures quality at analytical level