Data Warehouse & Analytics: Interview Prep Guide
Goal: Systematic, academically rigorous coverage of data warehouse concepts — from foundational theory to enterprise-scale implementation.
Audience: Senior software engineers and architects preparing for roles managing data-intensive systems in multinational organizations.
The Mental Model: Data → Warehouse → Insights → Authority
graph LR
A[Sources] --> B["OLTP DB"]
A --> C["Operational Data"]
B --> D["ETL/ELT Pipelines"]
C --> D
D --> E["Medallion Layers:<br/>Bronze|Silver|Gold"]
E --> F["Data Warehouse<br/>Dimensional Models"]
F --> G["OLAP Queries"]
G --> H["Analytics & BI"]
H --> I["Business Decision"]
I --> J["Governance & Lineage"]
J --> K["Enterprise Authority"]
Sections at a Glance
| # | Section | Core Topics |
|---|---|---|
| 01 | Foundations | OLTP vs OLAP, Data Systems Taxonomy, Why Data Warehouses, Constraints & Trade-offs |
| 02 | Warehouse Architecture | Logical vs Physical Design, Dimensional Modeling, Schema Designs (Star, Snowflake) |
| 03 | Medallion Architecture | Bronze (Raw) / Silver (Cleaned) / Gold (Business) Layers, Governance Model |
| 04 | ETL & ELT Pipelines | Extract-Transform-Load vs ELT, Data Quality, Idempotency, Late-Arriving Data Handling |
| 05 | Data Modeling | Dimensional Modeling Theory, Conformed Dimensions, Slowly Changing Dimensions (SCD) |
| 06 | Query Optimization | Columnar Storage, Partitioning, Indexing, Cost Optimization, Query Execution Plans |
| 07 | Big Data Platforms | Distributed Computing, Hadoop Ecosystem, Spark, Workflow Orchestration (Airflow, dbt) |
| 08 | Tools Ecosystem | BigQuery, PostgreSQL, Cassandra, MongoDB, Databricks — comparative analysis |
| 09 | Enterprise Data Mgmt | Data Governance, Lineage Tracking, Data Quality, Metadata Management, Security |
| 10 | Labs | Practical hands-on exercises with SQL, Python, dbt, and real-world scenarios |
| 11 | Interview Scenarios | Design questions, trade-off analysis, architecture decision records (ADRs) |
Recommended Reading Order
Phase 1: Build Mental Models (Foundations)
- Section 01 — Understand why data warehouses exist and how they differ from OLTP systems
- Section 02 — Grasp the logical architecture: dimensions, facts, schemas
- Section 05 — Deepen understanding of dimensional modeling theory
Phase 2: Architectural Patterns (Design)
- Section 03 — Learn the medallion/lakehouse pattern (industry best practice)
- Section 04 — Understand ETL/ELT trade-offs and patterns
- Section 06 — Learn how to optimize queries at scale
Phase 3: Technology & Tools (Breadth)
- Section 07 — Understand distributed systems and orchestration
- Section 08 — Deep-dive into specific platforms relevant to your interviews
Phase 4: Enterprise Perspective (Implementation)
- Section 09 — Learn how to manage data warehouses at organizational scale
- Section 10 — Build hands-on competency with labs
- Section 11 — Synthesize through interview scenarios
What Senior Architects Must Know
✅ Conceptual Depth
- How does dimensional modeling improve query performance?
- Why do medallion layers solve governance and debugging challenges?
- What are the trade-offs between OLTP and OLAP design?
✅ Scale & Distributed Systems
- How do you ensure data consistency in eventually-consistent systems?
- What partitioning strategy minimizes query latency and cost?
- How does columnar storage change your dimensional model?
✅ Operations & Governance
- How do you track data lineage across ETL pipelines?
- What does data quality mean in an enterprise context?
- How do you enforce access control and PII handling?
✅ Tool Selection & Trade-offs
- When do you use BigQuery vs. Databricks vs. PostgreSQL?
- What's the cost/performance trade-off of each platform?
- How do you avoid vendor lock-in?
Key Concepts Glossary
| Concept | Definition |
|---|---|
| OLTP | Online Transaction Processing — optimized for write-heavy, normalized transactions |
| OLAP | Online Analytical Processing — optimized for read-heavy, denormalized aggregations |
| Dimensional Modeling | Organizing data around dimensions (descriptive attributes) and facts (measurements) |
| ETL | Extract → Transform → Load: standard pipeline pattern for data ingestion |
| ELT | Extract → Load → Transform: modern pattern where transformation happens in the warehouse |
| Medallion | Three-layer architecture: Bronze (raw) → Silver (validated) → Gold (business-ready) |
| SCD (Slowly Changing Dimension) | Handling historical changes to dimension attributes (Type 1, 2, 3, etc.) |
| Star Schema | Denormalized dimensional design with central fact table surrounded by dimensions |
| Snowflake Schema | Normalized dimensional design with hierarchical dimension tables |
| Data Lineage | Tracking the origin, transformation, and usage of data across the warehouse |
| Data Governance | Organizational policies and processes for data ownership, quality, and access |
Deep-Dive Structure
This guide uses a two-tier content model:
- Main Sections (
NN-section.md) — breadth-first, key patterns, summary references - Deep-Dives (
NN.XX-topic.md) — focused deep-dives into specific topics
Each main section links to deep-dive articles where you can go deeper without cluttering the overview.
Example: Section 03 (Medallion Architecture) has three deep-dives:
- 03.01 — Bronze Layer (raw data ingestion, schema-on-read)
- 03.02 — Silver Layer (data quality, deduplication, transformation)
- 03.03 — Gold Layer (business logic, conformed models)
How to Use This Guide
For Breadth-First Learning
Read sections 01–09 sequentially. These cover the full spectrum without requiring tool-specific knowledge.
For Tool-Specific Deep Learning
After section 08, pick one or more deep-dive files:
- Section 08.01 for BigQuery
- Section 08.05 for Databricks
- Section 08.02 for PostgreSQL (as a baseline RDBMS)
For Hands-On Practice
- Section 10 contains labs with real data and scenarios
- Use SQL/dbt examples to implement concepts from sections 02–06
For Interview Preparation
- Review Section 11 for design scenarios
- Work through trade-off questions before interviews
- Use ADR templates to structure your thinking
Prerequisites
This guide assumes you have:
- ✅ SQL fundamentals (SELECT, JOIN, aggregation)
- ✅ Basic database concepts (tables, indexes, transactions)
- ✅ Familiarity with at least one of: BigQuery, PostgreSQL, Cassandra, MongoDB, or Databricks
- ❓ Optional: Python, dbt, or Airflow (helpful but not required)
If any of these feel shaky, review quickly before diving in.
Learning Outcomes
By completing this guide, you will understand:
- Why data warehouses exist and how they differ from transactional databases
- When to use different data platforms (OLTP, analytical, NoSQL, big data)
- How to design scalable data warehouses with medallion architecture
- What trade-offs exist in ETL/ELT pipelines, schema design, and query optimization
- How to manage enterprise data systems: governance, lineage, quality, security
- How to evaluate and select tools for different organizational needs
- How to articulate these concepts in architecture interviews with confidence
Pro Tips for This Guide
Mermaid Diagrams
Don't just read the diagrams — redraw them as you study. This forces you to understand the relationships and not just memorize.
Tables & Comparisons
Use comparison tables (e.g., Star vs. Snowflake Schema) to quickly recall trade-offs during interviews.
Deep-Dives Are Optional
If you already know a topic, skip the deep-dive and move on. This guide is designed for depth on demand.
Labs Reinforce Theory
Do at least 2–3 labs after completing sections 02–05. Hands-on work is critical for senior-level interviews.
Interview Context Matters
Before interviews, research the company's tech stack. Some deep-dives (e.g., 08.01 for BigQuery companies) will be higher-priority than others.
Questions? Feedback?
This guide is a living document. If you find gaps or have suggestions, document them and revisit.
Good luck with your interviews! 🚀