Data Warehouse & Analytics: Interview Prep Guide

Goal: Systematic, academically rigorous coverage of data warehouse concepts — from foundational theory to enterprise-scale implementation.

Audience: Senior software engineers and architects preparing for roles managing data-intensive systems in multinational organizations.

The Mental Model: Data → Warehouse → Insights → Authority

graph LR
    A[Sources] --> B["OLTP DB"]
    A --> C["Operational Data"]
    B --> D["ETL/ELT Pipelines"]
    C --> D
    D --> E["Medallion Layers:<br/>Bronze|Silver|Gold"]
    E --> F["Data Warehouse<br/>Dimensional Models"]
    F --> G["OLAP Queries"]
    G --> H["Analytics & BI"]
    H --> I["Business Decision"]
    I --> J["Governance & Lineage"]
    J --> K["Enterprise Authority"]

Sections at a Glance

#	Section	Core Topics
01	Foundations	OLTP vs OLAP, Data Systems Taxonomy, Why Data Warehouses, Constraints & Trade-offs
02	Warehouse Architecture	Logical vs Physical Design, Dimensional Modeling, Schema Designs (Star, Snowflake)
03	Medallion Architecture	Bronze (Raw) / Silver (Cleaned) / Gold (Business) Layers, Governance Model
04	ETL & ELT Pipelines	Extract-Transform-Load vs ELT, Data Quality, Idempotency, Late-Arriving Data Handling
05	Data Modeling	Dimensional Modeling Theory, Conformed Dimensions, Slowly Changing Dimensions (SCD)
06	Query Optimization	Columnar Storage, Partitioning, Indexing, Cost Optimization, Query Execution Plans
07	Big Data Platforms	Distributed Computing, Hadoop Ecosystem, Spark, Workflow Orchestration (Airflow, dbt)
08	Tools Ecosystem	BigQuery, PostgreSQL, Cassandra, MongoDB, Databricks — comparative analysis
09	Enterprise Data Mgmt	Data Governance, Lineage Tracking, Data Quality, Metadata Management, Security
10	Labs	Practical hands-on exercises with SQL, Python, dbt, and real-world scenarios
11	Interview Scenarios	Design questions, trade-off analysis, architecture decision records (ADRs)

What Senior Architects Must Know

✅ Conceptual Depth

How does dimensional modeling improve query performance?
Why do medallion layers solve governance and debugging challenges?
What are the trade-offs between OLTP and OLAP design?

✅ Scale & Distributed Systems

How do you ensure data consistency in eventually-consistent systems?
What partitioning strategy minimizes query latency and cost?
How does columnar storage change your dimensional model?

✅ Operations & Governance

How do you track data lineage across ETL pipelines?
What does data quality mean in an enterprise context?
How do you enforce access control and PII handling?

✅ Tool Selection & Trade-offs

When do you use BigQuery vs. Databricks vs. PostgreSQL?
What's the cost/performance trade-off of each platform?
How do you avoid vendor lock-in?

Key Concepts Glossary

Concept	Definition
OLTP	Online Transaction Processing — optimized for write-heavy, normalized transactions
OLAP	Online Analytical Processing — optimized for read-heavy, denormalized aggregations
Dimensional Modeling	Organizing data around dimensions (descriptive attributes) and facts (measurements)
ETL	Extract → Transform → Load: standard pipeline pattern for data ingestion
ELT	Extract → Load → Transform: modern pattern where transformation happens in the warehouse
Medallion	Three-layer architecture: Bronze (raw) → Silver (validated) → Gold (business-ready)
SCD (Slowly Changing Dimension)	Handling historical changes to dimension attributes (Type 1, 2, 3, etc.)
Star Schema	Denormalized dimensional design with central fact table surrounded by dimensions
Snowflake Schema	Normalized dimensional design with hierarchical dimension tables
Data Lineage	Tracking the origin, transformation, and usage of data across the warehouse
Data Governance	Organizational policies and processes for data ownership, quality, and access

Deep-Dive Structure

This guide uses a two-tier content model:

Main Sections (NN-section.md) — breadth-first, key patterns, summary references
Deep-Dives (NN.XX-topic.md) — focused deep-dives into specific topics

Each main section links to deep-dive articles where you can go deeper without cluttering the overview.

Example: Section 03 (Medallion Architecture) has three deep-dives:

03.01 — Bronze Layer (raw data ingestion, schema-on-read)
03.02 — Silver Layer (data quality, deduplication, transformation)
03.03 — Gold Layer (business logic, conformed models)

How to Use This Guide

For Breadth-First Learning

Read sections 01–09 sequentially. These cover the full spectrum without requiring tool-specific knowledge.

For Tool-Specific Deep Learning

After section 08, pick one or more deep-dive files:

Section 08.01 for BigQuery
Section 08.05 for Databricks
Section 08.02 for PostgreSQL (as a baseline RDBMS)

For Hands-On Practice

Section 10 contains labs with real data and scenarios
Use SQL/dbt examples to implement concepts from sections 02–06

For Interview Preparation

Review Section 11 for design scenarios
Work through trade-off questions before interviews
Use ADR templates to structure your thinking

Prerequisites

This guide assumes you have:

✅ SQL fundamentals (SELECT, JOIN, aggregation)
✅ Basic database concepts (tables, indexes, transactions)
✅ Familiarity with at least one of: BigQuery, PostgreSQL, Cassandra, MongoDB, or Databricks
❓ Optional: Python, dbt, or Airflow (helpful but not required)

If any of these feel shaky, review quickly before diving in.

Learning Outcomes

By completing this guide, you will understand:

Why data warehouses exist and how they differ from transactional databases
When to use different data platforms (OLTP, analytical, NoSQL, big data)
How to design scalable data warehouses with medallion architecture
What trade-offs exist in ETL/ELT pipelines, schema design, and query optimization
How to manage enterprise data systems: governance, lineage, quality, security
How to evaluate and select tools for different organizational needs
How to articulate these concepts in architecture interviews with confidence

Pro Tips for This Guide

Mermaid Diagrams

Don't just read the diagrams — redraw them as you study. This forces you to understand the relationships and not just memorize.

Tables & Comparisons

Use comparison tables (e.g., Star vs. Snowflake Schema) to quickly recall trade-offs during interviews.

Deep-Dives Are Optional

If you already know a topic, skip the deep-dive and move on. This guide is designed for depth on demand.

Labs Reinforce Theory

Do at least 2–3 labs after completing sections 02–05. Hands-on work is critical for senior-level interviews.

Interview Context Matters

Before interviews, research the company's tech stack. Some deep-dives (e.g., 08.01 for BigQuery companies) will be higher-priority than others.

Questions? Feedback?

This guide is a living document. If you find gaps or have suggestions, document them and revisit.

Good luck with your interviews! 🚀

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Data Warehouse & Analytics: Interview Prep Guide

The Mental Model: Data → Warehouse → Insights → Authority

Sections at a Glance

Recommended Reading Order

Phase 1: Build Mental Models (Foundations)

Phase 2: Architectural Patterns (Design)

Phase 3: Technology & Tools (Breadth)

Phase 4: Enterprise Perspective (Implementation)

What Senior Architects Must Know

Key Concepts Glossary

Deep-Dive Structure

How to Use This Guide

For Breadth-First Learning

For Tool-Specific Deep Learning

For Hands-On Practice

For Interview Preparation

Prerequisites

Learning Outcomes

Pro Tips for This Guide

Questions? Feedback?