Data Warehouse & Analytics: Interview Prep Guide

Goal: Systematic, academically rigorous coverage of data warehouse concepts — from foundational theory to enterprise-scale implementation.

Audience: Senior software engineers and architects preparing for roles managing data-intensive systems in multinational organizations.


The Mental Model: Data → Warehouse → Insights → Authority

graph LR
    A[Sources] --> B["OLTP DB"]
    A --> C["Operational Data"]
    B --> D["ETL/ELT Pipelines"]
    C --> D
    D --> E["Medallion Layers:<br/>Bronze|Silver|Gold"]
    E --> F["Data Warehouse<br/>Dimensional Models"]
    F --> G["OLAP Queries"]
    G --> H["Analytics & BI"]
    H --> I["Business Decision"]
    I --> J["Governance & Lineage"]
    J --> K["Enterprise Authority"]

Sections at a Glance

# Section Core Topics
01 Foundations OLTP vs OLAP, Data Systems Taxonomy, Why Data Warehouses, Constraints & Trade-offs
02 Warehouse Architecture Logical vs Physical Design, Dimensional Modeling, Schema Designs (Star, Snowflake)
03 Medallion Architecture Bronze (Raw) / Silver (Cleaned) / Gold (Business) Layers, Governance Model
04 ETL & ELT Pipelines Extract-Transform-Load vs ELT, Data Quality, Idempotency, Late-Arriving Data Handling
05 Data Modeling Dimensional Modeling Theory, Conformed Dimensions, Slowly Changing Dimensions (SCD)
06 Query Optimization Columnar Storage, Partitioning, Indexing, Cost Optimization, Query Execution Plans
07 Big Data Platforms Distributed Computing, Hadoop Ecosystem, Spark, Workflow Orchestration (Airflow, dbt)
08 Tools Ecosystem BigQuery, PostgreSQL, Cassandra, MongoDB, Databricks — comparative analysis
09 Enterprise Data Mgmt Data Governance, Lineage Tracking, Data Quality, Metadata Management, Security
10 Labs Practical hands-on exercises with SQL, Python, dbt, and real-world scenarios
11 Interview Scenarios Design questions, trade-off analysis, architecture decision records (ADRs)

Phase 1: Build Mental Models (Foundations)

  • Section 01 — Understand why data warehouses exist and how they differ from OLTP systems
  • Section 02 — Grasp the logical architecture: dimensions, facts, schemas
  • Section 05 — Deepen understanding of dimensional modeling theory

Phase 2: Architectural Patterns (Design)

  • Section 03 — Learn the medallion/lakehouse pattern (industry best practice)
  • Section 04 — Understand ETL/ELT trade-offs and patterns
  • Section 06 — Learn how to optimize queries at scale

Phase 3: Technology & Tools (Breadth)

  • Section 07 — Understand distributed systems and orchestration
  • Section 08 — Deep-dive into specific platforms relevant to your interviews

Phase 4: Enterprise Perspective (Implementation)

  • Section 09 — Learn how to manage data warehouses at organizational scale
  • Section 10 — Build hands-on competency with labs
  • Section 11 — Synthesize through interview scenarios

What Senior Architects Must Know

Conceptual Depth

  • How does dimensional modeling improve query performance?
  • Why do medallion layers solve governance and debugging challenges?
  • What are the trade-offs between OLTP and OLAP design?

Scale & Distributed Systems

  • How do you ensure data consistency in eventually-consistent systems?
  • What partitioning strategy minimizes query latency and cost?
  • How does columnar storage change your dimensional model?

Operations & Governance

  • How do you track data lineage across ETL pipelines?
  • What does data quality mean in an enterprise context?
  • How do you enforce access control and PII handling?

Tool Selection & Trade-offs

  • When do you use BigQuery vs. Databricks vs. PostgreSQL?
  • What's the cost/performance trade-off of each platform?
  • How do you avoid vendor lock-in?

Key Concepts Glossary

Concept Definition
OLTP Online Transaction Processing — optimized for write-heavy, normalized transactions
OLAP Online Analytical Processing — optimized for read-heavy, denormalized aggregations
Dimensional Modeling Organizing data around dimensions (descriptive attributes) and facts (measurements)
ETL Extract → Transform → Load: standard pipeline pattern for data ingestion
ELT Extract → Load → Transform: modern pattern where transformation happens in the warehouse
Medallion Three-layer architecture: Bronze (raw) → Silver (validated) → Gold (business-ready)
SCD (Slowly Changing Dimension) Handling historical changes to dimension attributes (Type 1, 2, 3, etc.)
Star Schema Denormalized dimensional design with central fact table surrounded by dimensions
Snowflake Schema Normalized dimensional design with hierarchical dimension tables
Data Lineage Tracking the origin, transformation, and usage of data across the warehouse
Data Governance Organizational policies and processes for data ownership, quality, and access

Deep-Dive Structure

This guide uses a two-tier content model:

  • Main Sections (NN-section.md) — breadth-first, key patterns, summary references
  • Deep-Dives (NN.XX-topic.md) — focused deep-dives into specific topics

Each main section links to deep-dive articles where you can go deeper without cluttering the overview.

Example: Section 03 (Medallion Architecture) has three deep-dives:

  • 03.01 — Bronze Layer (raw data ingestion, schema-on-read)
  • 03.02 — Silver Layer (data quality, deduplication, transformation)
  • 03.03 — Gold Layer (business logic, conformed models)

How to Use This Guide

For Breadth-First Learning

Read sections 01–09 sequentially. These cover the full spectrum without requiring tool-specific knowledge.

For Tool-Specific Deep Learning

After section 08, pick one or more deep-dive files:

  • Section 08.01 for BigQuery
  • Section 08.05 for Databricks
  • Section 08.02 for PostgreSQL (as a baseline RDBMS)

For Hands-On Practice

  • Section 10 contains labs with real data and scenarios
  • Use SQL/dbt examples to implement concepts from sections 02–06

For Interview Preparation

  • Review Section 11 for design scenarios
  • Work through trade-off questions before interviews
  • Use ADR templates to structure your thinking

Prerequisites

This guide assumes you have:

  • ✅ SQL fundamentals (SELECT, JOIN, aggregation)
  • ✅ Basic database concepts (tables, indexes, transactions)
  • ✅ Familiarity with at least one of: BigQuery, PostgreSQL, Cassandra, MongoDB, or Databricks
  • ❓ Optional: Python, dbt, or Airflow (helpful but not required)

If any of these feel shaky, review quickly before diving in.


Learning Outcomes

By completing this guide, you will understand:

  1. Why data warehouses exist and how they differ from transactional databases
  2. When to use different data platforms (OLTP, analytical, NoSQL, big data)
  3. How to design scalable data warehouses with medallion architecture
  4. What trade-offs exist in ETL/ELT pipelines, schema design, and query optimization
  5. How to manage enterprise data systems: governance, lineage, quality, security
  6. How to evaluate and select tools for different organizational needs
  7. How to articulate these concepts in architecture interviews with confidence

Pro Tips for This Guide

Mermaid Diagrams

Don't just read the diagrams — redraw them as you study. This forces you to understand the relationships and not just memorize.

Tables & Comparisons

Use comparison tables (e.g., Star vs. Snowflake Schema) to quickly recall trade-offs during interviews.

Deep-Dives Are Optional

If you already know a topic, skip the deep-dive and move on. This guide is designed for depth on demand.

Labs Reinforce Theory

Do at least 2–3 labs after completing sections 02–05. Hands-on work is critical for senior-level interviews.

Interview Context Matters

Before interviews, research the company's tech stack. Some deep-dives (e.g., 08.01 for BigQuery companies) will be higher-priority than others.


Questions? Feedback?

This guide is a living document. If you find gaps or have suggestions, document them and revisit.

Good luck with your interviews! 🚀