3.03 · Deep Dive: Gold Layer - Business Analytics & Reporting
Level: Intermediate to Advanced Time to read: 18 min Pre-reading: 03 · Medallion Architecture · 3.02 · Silver Layer After reading: You'll understand how to build the Gold layer: conformed dimensions, fact tables, and aggregations that power business analytics and dashboards.
Gold Layer: Purpose & Design
The Gold layer applies business logic and builds dimensional models optimized for analytics, reporting, and machine learning.
Characteristics
| Aspect | Detail |
|---|---|
| Data Quality | SLA guaranteed (99.95%+ accuracy) |
| Schema | Dimensional model (Star/Snowflake schema) |
| Transformations | Business logic, aggregations, conformed dimensions |
| Update Strategy | SCD Type 1 (overwrite) or Type 2 (track versions) |
| Ownership | Analytics engineering team |
| Consumer Grade | For business analysts, BI tools, executives |
Gold Layer Architecture
graph LR
S["🥈 Silver<br/>(Validated)"]
BL["Business<br/>Logic"]
CD["Conformed<br/>Dimensions"]
FF["Fact<br/>Tables"]
AGG["Aggregations"]
G["🥇 Gold<br/>(Business-Ready)"]
S -->|Define measures| BL
BL -->|Deduped refs| CD
CD -->|Join dimensions| FF
FF -->|Pre-aggregate| AGG
CD --> G
FF --> G
AGG --> G
Gold Components
1. Conformed Dimensions
Definition: Reusable dimension tables that maintain consistency across multiple fact tables.
Example: Conformed Customer Dimension
CREATE OR REPLACE TABLE gold.dim_customer AS
SELECT
-- Surrogate key (for performance)
GENERATE_UNIQUE_ID() as customer_key,
-- Business key (for traceability)
c.customer_id,
-- Customer attributes
c.first_name,
c.last_name,
CONCAT(c.first_name, ' ', c.last_name) as full_name,
c.email,
c.phone,
-- Derived attributes (business logic)
LOWER(REGEXP_EXTRACT(c.email, r'^[^@]+')) as email_prefix,
EXTRACT(YEAR FROM CURRENT_DATE()) - EXTRACT(YEAR FROM c.birth_date) as age,
-- SCD Type 2 tracking
c.effective_date,
c.end_date,
c.is_current,
-- Reference to geography
g.country_key,
g.country_name,
g.region_name,
-- Customer segment (from ML or business rules)
s.customer_segment,
s.lifetime_value_bucket,
-- Metadata
CURRENT_TIMESTAMP() as _dbt_updated_at,
CURRENT_DATE() as _dbt_load_date
FROM silver.customers c
LEFT JOIN gold.dim_geography g
ON c.country_code = g.country_code
LEFT JOIN gold.dim_customer_segment s
ON c.customer_id = s.customer_id
AND CURRENT_DATE() BETWEEN s.effective_date AND COALESCE(s.end_date, '2099-12-31')
WHERE c.is_current = TRUE;
2. Fact Tables
Definition: Measures (metrics) at a grain (level of detail) with foreign keys to dimensions.
Example: Sales Fact Table
CREATE OR REPLACE TABLE gold.fact_sales AS
SELECT
-- Grain: One row per order line item per transaction
-- Fact keys (foreign keys to dimensions)
dk.customer_key,
dp.product_key,
ds.store_key,
dd.date_key,
-- Degenerate dimensions (attributes specific to transaction)
o.order_id,
o.line_item_number,
-- Measures (numeric facts)
o.quantity,
o.unit_price,
o.unit_cost,
(o.quantity * o.unit_price) as extended_price,
(o.quantity * o.unit_cost) as extended_cost,
(o.quantity * (o.unit_price - o.unit_cost)) as gross_profit,
-- Derived measures
CASE
WHEN o.discount_percent > 0.2 THEN 'HIGH'
WHEN o.discount_percent > 0.1 THEN 'MEDIUM'
ELSE 'LOW'
END as discount_tier,
-- Metadata
CURRENT_TIMESTAMP() as _dbt_updated_at
FROM silver.orders o
LEFT JOIN gold.dim_customer dk ON o.customer_id = dk.customer_id
LEFT JOIN gold.dim_product dp ON o.product_id = dp.product_id
LEFT JOIN gold.dim_store ds ON o.store_id = ds.store_id
LEFT JOIN gold.dim_date dd ON DATE(o.order_date) = dd.date_actual
WHERE o._is_valid = TRUE;
3. Aggregated Tables (Pre-Aggregations)
Definition: Pre-computed aggregations to power dashboards without requiring real-time computation.
Example: Daily Revenue Summary
CREATE OR REPLACE TABLE gold.agg_daily_revenue AS
SELECT
dd.date_key,
dd.date_actual,
dd.year,
dd.month,
dd.day_of_month,
dp.product_key,
dp.product_id,
dp.product_name,
dp.category,
ds.store_key,
ds.store_id,
ds.store_name,
ds.region,
-- Aggregated measures
COUNT(DISTINCT f.order_id) as num_orders,
SUM(f.quantity) as total_quantity,
SUM(f.extended_price) as total_revenue,
SUM(f.extended_cost) as total_cost,
SUM(f.gross_profit) as total_profit,
-- Derived metrics
SAFE_DIVIDE(SUM(f.gross_profit), SUM(f.extended_price)) as profit_margin,
SAFE_DIVIDE(SUM(f.extended_price), COUNT(DISTINCT f.order_id)) as avg_order_value,
CURRENT_TIMESTAMP() as _dbt_updated_at
FROM gold.fact_sales f
LEFT JOIN gold.dim_date dd ON f.date_key = dd.date_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN gold.dim_store ds ON f.store_key = ds.store_key
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13;
Gold Layer Design Patterns
Pattern 1: Star Schema (Recommended)
Fact table (center) connected to dimension tables (points)
┌─────────────┐
│ dim_customer│
└──────┬──────┘
│
┌──────────┼──────────┐
│ │ │
┌───┴───┐ ┌──▼───┐ ┌──┴────┐
│dim_ │ │fact_ │ │dim_ │
│date │──┤sales ├──┤product│
└───┬───┘ └──▲───┘ └──┬────┘
│ │ │
└─────────┼─────────┘
┌──┴────┐
│dim_ │
│store │
└───────┘
Advantages:
- Simple to query (fewer joins)
- Faster analytical queries
- Easy to understand for business users
Disadvantages:
- Potential for data redundancy
- Dimension updates can be complex
Pattern 2: Snowflake Schema (Normalized)
Dimensions are further normalized
┌─────────────────┐
│ dim_customer │
└────────┬────────┘
│
┌─────────────┼──────────────┐
│ │ │
┌───┴────┐ ┌───▼──┐ ┌──────┴──┐
│dim_date │ │fact_ │ │dim_prod │
│ │───┤sales ├───┤ │
└────┬────┘ └──▲───┘ └────┬────┘
│ │ │
│ ┌────┴─────┐ │
└──────┤ dim_ │ │
│ store │ │
└──────────┘ │
│
┌────┴─────┐
│dim_ │
│category │
└───────────┘
Advantages:
- Less redundancy (normalized)
- Easier dimension maintenance
- Reduced storage
Disadvantages:
- More complex queries (more joins)
- Slower analytical queries
Pattern 3: Data Vault (Complex but Flexible)
Hub-Link-Satellite pattern for maximum flexibility
┌────────────┐
│ Hub_ │ (Business keys)
│ Customer │
└────────────┘
│
├──────→ Link_Customer_Order
│ ├──→ Sat_Customer
│ └──→ Sat_Order
│
└──────→ Link_Customer_Geography
└──→ Sat_Geography
Business Logic & Transformations
Pattern 1: Calculated Columns
-- Add business-derived columns
CREATE OR REPLACE TABLE gold.fact_sales_enhanced AS
SELECT
f.*,
-- Customer lifetime value tier (business logic)
CASE
WHEN clv.lifetime_value > 100000 THEN 'VIP'
WHEN clv.lifetime_value > 50000 THEN 'HIGH_VALUE'
WHEN clv.lifetime_value > 10000 THEN 'MEDIUM_VALUE'
ELSE 'LOW_VALUE'
END as customer_value_tier,
-- Seasonality flag (business logic)
CASE
WHEN EXTRACT(MONTH FROM f.order_date) IN (11, 12) THEN 'HOLIDAY'
WHEN EXTRACT(MONTH FROM f.order_date) IN (6, 7, 8) THEN 'SUMMER'
ELSE 'REGULAR'
END as seasonality,
-- Product affinity (business logic)
CASE
WHEN dp.category IN ('Electronics', 'Computers') THEN 'TECH_AFFINITY'
WHEN dp.category IN ('Books', 'Media') THEN 'CONTENT_AFFINITY'
ELSE 'GENERAL'
END as affinity_group
FROM gold.fact_sales f
LEFT JOIN gold.dim_customer dk ON f.customer_key = dk.customer_key
LEFT JOIN gold.dim_product dp ON f.product_key = dp.product_key
LEFT JOIN customer_lifetime_values clv ON dk.customer_id = clv.customer_id;
Pattern 2: Slowly Changing Dimensions (SCD Type 2)
-- Track dimension changes over time
CREATE OR REPLACE TABLE gold.dim_product_scd2 AS
SELECT
ROW_NUMBER() OVER (ORDER BY sp.product_id, sp.effective_date) as product_key,
sp.product_id,
sp.product_name,
sp.category,
sp.price,
sp.effective_date,
sp.end_date,
CASE WHEN sp.end_date IS NULL THEN TRUE ELSE FALSE END as is_current,
ROW_NUMBER() OVER (PARTITION BY sp.product_id ORDER BY sp.effective_date) as version_number
FROM silver.products sp
WHERE sp.is_current = TRUE OR sp.end_date > CURRENT_DATE() - INTERVAL 90 DAY;
Gold Layer Testing
dbt Tests for Gold Layer
# dbt/models/gold/gold_fact_sales.yml
version: 2
models:
- name: gold_fact_sales
description: "Fact table of sales transactions with SLA: 99.95% accuracy, loaded by 6 AM daily"
tests:
- dbt_utils.recency:
datepart: day
interval: 1
field: _dbt_updated_at
- dbt_utils.equality:
compare_model: ref('expected_fact_sales')
tolerance_percent: 0.05
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_key
tests:
- not_null
- relationships:
to: ref('gold_dim_customer')
field: customer_key
- name: extended_price
tests:
- not_null
- positive_values:
column_name: extended_price
- name: gross_profit
tests:
- expression_is_true:
expression: 'gross_profit <= extended_price'
Gold Layer Performance Optimization
Materialization Strategies
| Strategy | Use Case | Performance | Cost |
|---|---|---|---|
| View | Simple pass-through logic | Real-time, variable | Pay-per-query |
| Table | Complex transformations | Fast reads, slow refreshes | Storage + compute |
| Incremental | Append-only or slowly changing | Fast incremental runs | Storage + compute |
| Snapshot | Dimension tracking | Point-in-time accuracy | Storage overhead |
Example: Incremental Materialization
-- dbt: Incremental refresh of fact table
{{
config(
materialized='incremental',
unique_key='order_id',
on_schema_change='fail',
partition_by={
'field': 'order_date',
'data_type': 'date',
'granularity': 'day'
},
cluster_by=['customer_key', 'product_key']
)
}}
SELECT
o.order_id,
o.customer_id,
o.product_id,
o.order_date,
...
FROM {{ ref('silver_orders') }} o
{% if execute and execute_macros %}
WHERE o.order_date >= (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
Gold Layer Best Practices
✅ Design for query patterns
- Understand how business users will query the data
- Pre-aggregate if needed
- Denormalize if it improves performance
✅ Maintain conformed dimensions
- Reuse dimensions across fact tables
- Single version of truth for each entity
- Version dimensions when business logic changes
✅ Document transformations
- Explain why business logic was chosen
- Document SCD strategies
- Record changes to definitions
✅ Guarantee SLAs
- Data freshness (loaded by specific time)
- Data accuracy (99.95%+ pass rate)
- Query performance (< 5 seconds for standard queries)
❌ Don't put raw data in Gold
- Always use Silver as source
- Never join directly to Bronze
❌ Don't create redundant dimensions
- Use conformed dimensions
- Avoid 10 different customer dimensions
Gold vs. Silver vs. Bronze
| Aspect | Bronze | Silver | Gold |
|---|---|---|---|
| Purpose | Raw ingest | Validated source of truth | Business analytics |
| Update Strategy | Append-only | SCD Type 2 | SCD Type 1/2 |
| Schema | Source schema | Technical schema | Business schema |
| Data Quality | Unknown | Certified | SLA guaranteed |
| Consumers | DataOps team | Analytics engineers | Business analysts |
| Retention | 1-2 years | 1+ years | 3+ years |
Key Takeaways
- Gold layer applies business logic and creates analytical models
- Star schema is most common (dimensional model)
- Conformed dimensions maintain consistency across models
- Aggregations pre-compute common metrics for performance
- SCD Type 2 preserves history for accurate analysis
- Testing ensures quality at analytical level