11 · Interview Scenarios: Design Questions, Trade-offs & ADRs

Level: Advanced Time to read: 20 min per scenario After reading: You'll be able to handle architecture interview questions, articulate trade-offs, and document decisions using Architecture Decision Records (ADRs).


The Interview: What They're Testing

Systems thinking — Can you see the big picture?
Trade-offs — Do you understand the costs/benefits?
Pragmatism — Can you balance theory with reality?
Communication — Can you explain your thinking clearly?
Depth on demand — Can you go deep when questioned?

Not testing: TypeScript syntax or memorized commands.


Scenario 1: The Startup → Scale Migration

Question

"We're a Series C startup with 10 GB of events per day. Today, we dump everything into PostgreSQL. As we scale to 1 TB/day, it's getting slow. How do you redesign this?"

Framework: Ask Questions First (30 seconds max)

Your Response (structured):

1. Clarify requirements (1-2 minutes):
   - "Is this OLTP or analytics workload?" 
     → Analytics (I need to know this affects architecture)
   - "What's the latency requirement?" 
     → Daily reports (fine to wait, batch is acceptable)
   - "How much data retention?" 
     → 3 years (storage cost matters)

2. Identify the constraint:
   - PostgreSQL row-oriented, can't compress
   - One machine vertically scales to ~2 TB
   - Problem: Both CPU and storage bottlenecked

3. Propose a solution (whiteboard):

   ┌─ PostgreSQL (OLTP)
   │  └─ Live operational data, real-time writes
   ├─ ETL Pipeline (Airflow)
   │  └─ Extract daily, deduplicate, validate
   └─ BigQuery (OLAP)
      └─ Analytics, 1 TB/day compression → 100 GB

Design Explanation

Why this architecture:

1. Separate concerns:
   - PostgreSQL: Optimized for writes
   - BigQuery: Optimized for reads

2. Cost comparison:
   - Keep PostgreSQL (10-50 GB): $500/month
   - Move to BigQuery ($6.25/TB): 100 GB/day × 30 = 3 TB = $18.75
   - (vs. PostgreSQL SSD: $500+ would quickly exceed)

3. Scaling path:
   - 10 GB → 1 TB → 10 TB (automatic in BigQuery)
   - No re-architecture needed

4. Timeline:
   - Months 1-2: Build ELT pipeline (dbt)
   - Month 3: Migrate historical data
   - Month 4: Flip to new system

Deep Dive: If Interviewer Presses...

Q: "Won't BigQuery cold-start be slow?"

A:

No, because:
1. BigQuery queries start immediately (no warm-up)
2. Columnar storage + partition pruning = fast
3. First 1 TB/month is free (if < 100 projects)
4. Caching: Results cached 24 hours free

Bottom line: BigQuery is actually faster than PostgreSQL at this scale.

Q: "How do you handle real-time analytics then?"

A:

Good question. Real-time needs differ:

Option 1 (3-hour latency acceptable):
  → Batch ELT: Stay with proposal

Option 2 (Real-time needed):
  → Add streaming layer:

    PostgreSQL → Kafka → BigQuery (streaming inserts)
    └─ Events flow to Kafka immediately
    └─ BigQuery ingests from Kafka (5-min micro-batches)
    └─ Real-time + batch reconciliation daily

Key Points to Mention

Separation of concerns (OLTP vs. OLAP)
Cost analysis (concrete numbers)
Scaling path (future-proof)
Implementation timeline (realistic)
Trade-off (latency vs. cost)


Scenario 2: The Data Quality Crisis

Question

"Our warehouse is full of bad data. Marketing reports don't match Finance reports. How do you fix organizational data quality?"

Your Approach

1. Diagnose the root cause:
   - "Where are sources of truth conflicting?"
   - Example: Revenue definition differs
     - Marketing: gross revenue (pre-discount)
     - Finance: net revenue (post-discount)

2. Build governance:
   - Assign data owners (Marketing owns their definitions)
   - Create metadata catalog (document all definitions)
   - Implement RACI model (who approves changes?)

3. Implement quality checks:
   - Tests for each definition (dbt tests)
   - SLA: 99.95% accuracy, flagged data
   - Alerts: Alert if quality drops below SLA

4. Create conformed definitions:
   - Marketing + Finance agree on "Revenue"
   - All reports use same definition
   - No more mismatches

Architecture: Quality Layer

graph TB
    A["Source<br/>Systems"]
    B["Bronze Layer<br/>(Raw)"]
    C["Data Quality<br/>Gate"]
    D["Silver Layer<br/>(Validated)"]
    E["Certification"]
    F["Gold Layer<br/>(Certified)"]

    A --> B
    B -->|"Duplicates?<br/>Nulls?<br/>Schema mismatch?"| C
    C -->|"PASS"| D
    C -->|"FAIL"| E
    E -->|"Alert owner"| E
    D --> F

If Interviewer Asks...

Q: "But who's responsible for fixing quality?"

A:

RACI Model:

Data Quality Gate Failures:

Issue: > 5% nulls in required field
  ├─ Data Owner (Finance): Responsible (gets alerted)
  ├─ Data Steward (Data team): Accountable (fixes pipeline)
  ├─ Analytics Lead: Consulted ("Should we wait for fix?")
  └─ BI Users: Informed (report might be delayed)

Example:
  "Sales amount is null in 10% of rows"
  ├─ Root cause: API change in PaymentService
  ├─ Owner: PaymentService team
  ├─ Fix: PaymentService publishes nullable field
  └─ Duration: 1-2 days to update

Governance: Once upstream is fixed, ETL is retried.

Key Points

Root cause analysis (don't just apply patches)
Governance (RACI, ownership)
Automation (dbt tests catch issues)
Monitoring (SLAs, alerts)
Documentation (conformed definitions)


Scenario 3: BI Tool Selection

Question

"We need a BI tool for business users. We have data in BigQuery. Should we use Looker, Tableau, or Metabase? What's your recommendation?"

Decision Framework

| Factor | Looker | Tableau | Metabase |
|--------|--------|---------|----------|
| BigQuery native | ✅✅ | ✅ | ✅ |
| Price | $$$ | $$ | $ |
| Ease of use | Hard | Easy | Very easy |
| SQL access | ✅ | ✅ | ✅ |
| Governance | ✅✅ | ✅ | ⚠️ |
| Maturity | Mature | Mature | Growing |

Recommendation depends on:
  - Budget available?
  - Technical skill of BI users?
  - Governance requirements?

Recommendation By Persona

A. Fortune 500 (Mature governance):
   └─ Looker
      ├─ Reason: Native BigQuery, strong governance, SSO
      ├─ Cost: Higher ($100K+/year) but justified
      └─ Implementation: 3-6 months (requires data modeling)

B. Growing SaaS ($50-500M):
   └─ Tableau
      ├─ Reason: Balance of ease + enterprise features
      ├─ Cost: $30K-100K/year
      └─ Implementation: 2-3 months

C. Startup / MVP:
   └─ Metabase
      ├─ Reason: Self-serve, low cost ($1K-10K/year)
      ├─ Cost: Very cheap
      └─ Implementation: 1-2 weeks

D. Advanced use case (Embedded BI):
   └─ Looker Studio (from Google)
      ├─ Reason: Free, native BigQuery, embeddable
      ├─ Cost: Free
      └─ Limitation: Less governance

If Interviewer Challenges...

Q: "But Tableau costs more. Isn't Looker better value?"

A:

Not necessarily. It depends on team skill:

Low-skill team (SQL unfamiliar):
  → Tableau wins (easier UI, steeper learning curve)
  → Looker requires upfront data modeling (LookML)
  → Cost of training + modeling delays ROI

High-skill team (many data engineers):
  → Looker wins (LookML is code, version control, CI/CD)
  → Can codify all business logic
  → Better long-term governance

Recommendation: Skill level first, then cost.


Scenario 4: Architecture Decision Record (ADR)

Question

"Document your decision to split this monolithic warehouse into medallion architecture. Use an ADR."

Template: Architecture Decision Record

# ADR 0003: Medallion Architecture for Data Warehouse

## Status
ACCEPTED

## Context
Our monolithic data warehouse (BigQuery single layer) has:
- No clear separation between raw and processed data
- Debugging is hard (can't trace issues to root cause)
- Quality issues propagate to all downstream consumers
- Data ownership unclear (raw vs. business-ready)

As we scale from 10 TB to 100 TB, this architecture won't scale governance-wise.

## Decision
Implement medallion architecture:
- Bronze: Raw data (append-only, no transformation)
- Silver: Validated/cleaned (deduplication, quality checks)
- Gold: Business-ready (dimensional models, SCD Type 2)

Reasons

  1. Separation of concerns: Raw vs. cleaned vs. business logic distinct
  2. Debugging: Error in quality? Check Silver. Error in business logic? Check Gold.
  3. Ownership: Clear responsibilities and SLAs per layer
  4. Governance: Can apply different retention/security policies per layer
  5. Scalability: Teams can own layers independently

Consequences

Positive:

  • Easier debugging (trace issues to specific layer)
  • Better governance (clear ownership boundaries)
  • Historical data preserved (Bronze append-only)
  • Team autonomy (BI team owns Gold, platform owns Bronze)

Negative:

  • More storage (Bronze + Silver + Gold vs. single layer)
  • Operational complexity (3 layers instead of 1)
  • Requires strong data engineering discipline
  • Migration effort (2-3 months to backfill history)

Cost Analysis

Current (Monolithic):  
  Storage: 10 TB → $62.50/month (at $6.25/TB)
  Compute: $0 (queries within free tier)

Proposed (Medallion):
  Bronze: 10 TB (raw data)
  Silver: 8 TB (deduplicated)
  Gold: 5 TB (aggregated)
  Total: 23 TB → $143.75/month

  Difference: +$81/month (worth the governance gains)

Implementation Plan

Phase Timeline Effort
Phase 1: Design schemas Weeks 1-2 Data architect
Phase 2: Build Bronze/Silver Weeks 3-6 Data engineers
Phase 3: Build Gold dimension models Weeks 7-10 Analytics team
Phase 4: Migrate historical (backfill) Weeks 11-13 Data engineers
Phase 5: Cutover (point to new layers) Week 14 All hands
Phase 6: Sunset old schema Week 15 Cleanup

Alternatives Considered

Alternative 1: Keep monolithic, add caching layer

  • ❌ Doesn't solve ownership problem
  • ❌ Still hard to debug

Alternative 2: Use data vault (instead of medallion)

  • ✅ More normalized than medallion
  • ❌ More complex to maintain for team
  • Decision: Medallion simpler, sufficient for our needs
  • ADR 0002: "Use BigQuery as warehouse" (platform choice)
  • ADR 0004: "Use dbt for transformations" (tooling)
  • ADR 0005: "SCD Type 2 for customer dimension" (modeling)

Question Types to Expect

Type Example Strategy
Design "Design a data warehouse for Netflix" Start broad, ask clarifying Qs, go deep
Trade-off "When do you use Event sourcing vs. CQRS?" List pro/con, pick with justification
Failure "How would you design for PII leaks?" Anticipate problems, defensive design
Scaling "How would this work at 1PB?" Identify bottlenecks, propose parallel solutions
Migration "How would you migrate from Postgres to BigQuery?" Phased approach, risk mitigation

Interview Tips

Structure Your Answer

1. Clarify (2 min):
   "Let me ask a few clarifying questions..."
   - Scope (how much data?)
   - Latency (how fast?)
   - Growth (when?)
   - Team (how many engineers?)

2. Identify constraints:
   "The constraint I see is: ..."

3. Propose architecture:
   "Here's what I'd do: [diagram]"

4. Justify:
   "Because: [trade-offs]"

5. Anticipate problems:
   "Potential issues: [solutions]"

Phrases to Use

"That's a good point. Let me reconsider..."
"Let me trade off the pros and cons..."
"In my experience, we faced this when..."
"I need more info on X before deciding..."
"I'm not sure..." (instead say "Let me think..." or propose alternatives)
"That's how we always did it" (need better reasoning)

When You Don't Know

Good response:
  "I haven't worked with Cassandra in depth, but based on 
   its wide-column design, I'd expect ... [apply knowledge 
   of similar systems]. Here's what I'd investigate..."

Bad response:
  "I don't know."

Final Checklist: Before Your Interview

✅ Understand medallion layers and why they matter
✅ Know dimensional modeling (star schema, SCD Type 2)
✅ Explain OLTP vs. OLAP and when to use each
✅ Be familiar with 3+ tools (BigQuery, Snowflake, PostgreSQL at minimum)
✅ Understand ETL vs. ELT trade-offs
✅ Know partitioning & clustering optimization strategies
✅ Speak to governance (RACI, quality, lineage)
✅ Articulate trade-offs with numbers (cost, latency, complexity)
✅ Be ready to ask clarifying questions before designing
✅ Practice whiteboarding (draw diagrams, label components)


Example Answers to Common Questions

Q1: "Design a data warehouse for an e-commerce company"

Clarify:
  - Scale? 100K orders/day
  - Growth? 50% YoY
  - Latency? Daily reports (batch okay)
  - Team? 5 data engineers, 20 analysts

Answer:
  ├─ Operational: PostgreSQL (OLTP)
  │  └─ Live order processing
  ├─ ELT Pipeline: dbt + Airflow
  │  ├─ Bronze: Raw postgres dumps
  │  ├─ Silver: Cleaned, deduplicated
  │  └─ Gold: Fact_sales + dimensions
  ├─ Analytics: BigQuery
  │  └─ Fact table (100M rows) + 10 dimensions
  │  └─ Star schema, SCD Type 2 for customers
  └─ BI: Looker (dashboard, data model governance)

Cost: ~$5K/month (1TB/month × $6.25 + team)
Timeline: 3 months to production

Trade-off: Extra storage cost (Bronze + Silver) is worth 
operational clarity for scaling team.

Key Takeaways

Answer frameworks (clarify → identify → propose → justify)
Trade-off thinking (nothing is free)
Ask good questions (shows maturity)
Use ADRs to document decisions
Be concrete (numbers, examples, not theory only)
Admit unknowns (but show reasoning)
Practice whiteboarding (communication is key)

Good luck with your interviews! 🚀