11 · Interview Scenarios: Design Questions, Trade-offs & ADRs
Level: Advanced Time to read: 20 min per scenario After reading: You'll be able to handle architecture interview questions, articulate trade-offs, and document decisions using Architecture Decision Records (ADRs).
The Interview: What They're Testing
✅ Systems thinking — Can you see the big picture?
✅ Trade-offs — Do you understand the costs/benefits?
✅ Pragmatism — Can you balance theory with reality?
✅ Communication — Can you explain your thinking clearly?
✅ Depth on demand — Can you go deep when questioned?
Not testing: TypeScript syntax or memorized commands.
Scenario 1: The Startup → Scale Migration
Question
"We're a Series C startup with 10 GB of events per day. Today, we dump everything into PostgreSQL. As we scale to 1 TB/day, it's getting slow. How do you redesign this?"
Framework: Ask Questions First (30 seconds max)
Your Response (structured):
1. Clarify requirements (1-2 minutes):
- "Is this OLTP or analytics workload?"
→ Analytics (I need to know this affects architecture)
- "What's the latency requirement?"
→ Daily reports (fine to wait, batch is acceptable)
- "How much data retention?"
→ 3 years (storage cost matters)
2. Identify the constraint:
- PostgreSQL row-oriented, can't compress
- One machine vertically scales to ~2 TB
- Problem: Both CPU and storage bottlenecked
3. Propose a solution (whiteboard):
┌─ PostgreSQL (OLTP)
│ └─ Live operational data, real-time writes
│
├─ ETL Pipeline (Airflow)
│ └─ Extract daily, deduplicate, validate
│
└─ BigQuery (OLAP)
└─ Analytics, 1 TB/day compression → 100 GB
Design Explanation
Why this architecture:
1. Separate concerns:
- PostgreSQL: Optimized for writes
- BigQuery: Optimized for reads
2. Cost comparison:
- Keep PostgreSQL (10-50 GB): $500/month
- Move to BigQuery ($6.25/TB): 100 GB/day × 30 = 3 TB = $18.75
- (vs. PostgreSQL SSD: $500+ would quickly exceed)
3. Scaling path:
- 10 GB → 1 TB → 10 TB (automatic in BigQuery)
- No re-architecture needed
4. Timeline:
- Months 1-2: Build ELT pipeline (dbt)
- Month 3: Migrate historical data
- Month 4: Flip to new system
Deep Dive: If Interviewer Presses...
Q: "Won't BigQuery cold-start be slow?"
A:
No, because:
1. BigQuery queries start immediately (no warm-up)
2. Columnar storage + partition pruning = fast
3. First 1 TB/month is free (if < 100 projects)
4. Caching: Results cached 24 hours free
Bottom line: BigQuery is actually faster than PostgreSQL at this scale.
Q: "How do you handle real-time analytics then?"
A:
Good question. Real-time needs differ:
Option 1 (3-hour latency acceptable):
→ Batch ELT: Stay with proposal
Option 2 (Real-time needed):
→ Add streaming layer:
PostgreSQL → Kafka → BigQuery (streaming inserts)
└─ Events flow to Kafka immediately
└─ BigQuery ingests from Kafka (5-min micro-batches)
└─ Real-time + batch reconciliation daily
Key Points to Mention
✅ Separation of concerns (OLTP vs. OLAP)
✅ Cost analysis (concrete numbers)
✅ Scaling path (future-proof)
✅ Implementation timeline (realistic)
✅ Trade-off (latency vs. cost)
Scenario 2: The Data Quality Crisis
Question
"Our warehouse is full of bad data. Marketing reports don't match Finance reports. How do you fix organizational data quality?"
Your Approach
1. Diagnose the root cause:
- "Where are sources of truth conflicting?"
- Example: Revenue definition differs
- Marketing: gross revenue (pre-discount)
- Finance: net revenue (post-discount)
2. Build governance:
- Assign data owners (Marketing owns their definitions)
- Create metadata catalog (document all definitions)
- Implement RACI model (who approves changes?)
3. Implement quality checks:
- Tests for each definition (dbt tests)
- SLA: 99.95% accuracy, flagged data
- Alerts: Alert if quality drops below SLA
4. Create conformed definitions:
- Marketing + Finance agree on "Revenue"
- All reports use same definition
- No more mismatches
Architecture: Quality Layer
graph TB
A["Source<br/>Systems"]
B["Bronze Layer<br/>(Raw)"]
C["Data Quality<br/>Gate"]
D["Silver Layer<br/>(Validated)"]
E["Certification"]
F["Gold Layer<br/>(Certified)"]
A --> B
B -->|"Duplicates?<br/>Nulls?<br/>Schema mismatch?"| C
C -->|"PASS"| D
C -->|"FAIL"| E
E -->|"Alert owner"| E
D --> F
If Interviewer Asks...
Q: "But who's responsible for fixing quality?"
A:
RACI Model:
Data Quality Gate Failures:
Issue: > 5% nulls in required field
├─ Data Owner (Finance): Responsible (gets alerted)
├─ Data Steward (Data team): Accountable (fixes pipeline)
├─ Analytics Lead: Consulted ("Should we wait for fix?")
└─ BI Users: Informed (report might be delayed)
Example:
"Sales amount is null in 10% of rows"
├─ Root cause: API change in PaymentService
├─ Owner: PaymentService team
├─ Fix: PaymentService publishes nullable field
└─ Duration: 1-2 days to update
Governance: Once upstream is fixed, ETL is retried.
Key Points
✅ Root cause analysis (don't just apply patches)
✅ Governance (RACI, ownership)
✅ Automation (dbt tests catch issues)
✅ Monitoring (SLAs, alerts)
✅ Documentation (conformed definitions)
Scenario 3: BI Tool Selection
Question
"We need a BI tool for business users. We have data in BigQuery. Should we use Looker, Tableau, or Metabase? What's your recommendation?"
Decision Framework
| Factor | Looker | Tableau | Metabase |
|--------|--------|---------|----------|
| BigQuery native | ✅✅ | ✅ | ✅ |
| Price | $$$ | $$ | $ |
| Ease of use | Hard | Easy | Very easy |
| SQL access | ✅ | ✅ | ✅ |
| Governance | ✅✅ | ✅ | ⚠️ |
| Maturity | Mature | Mature | Growing |
Recommendation depends on:
- Budget available?
- Technical skill of BI users?
- Governance requirements?
Recommendation By Persona
A. Fortune 500 (Mature governance):
└─ Looker
├─ Reason: Native BigQuery, strong governance, SSO
├─ Cost: Higher ($100K+/year) but justified
└─ Implementation: 3-6 months (requires data modeling)
B. Growing SaaS ($50-500M):
└─ Tableau
├─ Reason: Balance of ease + enterprise features
├─ Cost: $30K-100K/year
└─ Implementation: 2-3 months
C. Startup / MVP:
└─ Metabase
├─ Reason: Self-serve, low cost ($1K-10K/year)
├─ Cost: Very cheap
└─ Implementation: 1-2 weeks
D. Advanced use case (Embedded BI):
└─ Looker Studio (from Google)
├─ Reason: Free, native BigQuery, embeddable
├─ Cost: Free
└─ Limitation: Less governance
If Interviewer Challenges...
Q: "But Tableau costs more. Isn't Looker better value?"
A:
Not necessarily. It depends on team skill:
Low-skill team (SQL unfamiliar):
→ Tableau wins (easier UI, steeper learning curve)
→ Looker requires upfront data modeling (LookML)
→ Cost of training + modeling delays ROI
High-skill team (many data engineers):
→ Looker wins (LookML is code, version control, CI/CD)
→ Can codify all business logic
→ Better long-term governance
Recommendation: Skill level first, then cost.
Scenario 4: Architecture Decision Record (ADR)
Question
"Document your decision to split this monolithic warehouse into medallion architecture. Use an ADR."
Template: Architecture Decision Record
# ADR 0003: Medallion Architecture for Data Warehouse
## Status
ACCEPTED
## Context
Our monolithic data warehouse (BigQuery single layer) has:
- No clear separation between raw and processed data
- Debugging is hard (can't trace issues to root cause)
- Quality issues propagate to all downstream consumers
- Data ownership unclear (raw vs. business-ready)
As we scale from 10 TB to 100 TB, this architecture won't scale governance-wise.
## Decision
Implement medallion architecture:
- Bronze: Raw data (append-only, no transformation)
- Silver: Validated/cleaned (deduplication, quality checks)
- Gold: Business-ready (dimensional models, SCD Type 2)
Reasons
- Separation of concerns: Raw vs. cleaned vs. business logic distinct
- Debugging: Error in quality? Check Silver. Error in business logic? Check Gold.
- Ownership: Clear responsibilities and SLAs per layer
- Governance: Can apply different retention/security policies per layer
- Scalability: Teams can own layers independently
Consequences
Positive:
- Easier debugging (trace issues to specific layer)
- Better governance (clear ownership boundaries)
- Historical data preserved (Bronze append-only)
- Team autonomy (BI team owns Gold, platform owns Bronze)
Negative:
- More storage (Bronze + Silver + Gold vs. single layer)
- Operational complexity (3 layers instead of 1)
- Requires strong data engineering discipline
- Migration effort (2-3 months to backfill history)
Cost Analysis
Current (Monolithic):
Storage: 10 TB → $62.50/month (at $6.25/TB)
Compute: $0 (queries within free tier)
Proposed (Medallion):
Bronze: 10 TB (raw data)
Silver: 8 TB (deduplicated)
Gold: 5 TB (aggregated)
Total: 23 TB → $143.75/month
Difference: +$81/month (worth the governance gains)
Implementation Plan
| Phase | Timeline | Effort |
|---|---|---|
| Phase 1: Design schemas | Weeks 1-2 | Data architect |
| Phase 2: Build Bronze/Silver | Weeks 3-6 | Data engineers |
| Phase 3: Build Gold dimension models | Weeks 7-10 | Analytics team |
| Phase 4: Migrate historical (backfill) | Weeks 11-13 | Data engineers |
| Phase 5: Cutover (point to new layers) | Week 14 | All hands |
| Phase 6: Sunset old schema | Week 15 | Cleanup |
Alternatives Considered
Alternative 1: Keep monolithic, add caching layer
- ❌ Doesn't solve ownership problem
- ❌ Still hard to debug
Alternative 2: Use data vault (instead of medallion)
- ✅ More normalized than medallion
- ❌ More complex to maintain for team
- Decision: Medallion simpler, sufficient for our needs
Related ADRs
- ADR 0002: "Use BigQuery as warehouse" (platform choice)
- ADR 0004: "Use dbt for transformations" (tooling)
- ADR 0005: "SCD Type 2 for customer dimension" (modeling)
Question Types to Expect
| Type | Example | Strategy |
|---|---|---|
| Design | "Design a data warehouse for Netflix" | Start broad, ask clarifying Qs, go deep |
| Trade-off | "When do you use Event sourcing vs. CQRS?" | List pro/con, pick with justification |
| Failure | "How would you design for PII leaks?" | Anticipate problems, defensive design |
| Scaling | "How would this work at 1PB?" | Identify bottlenecks, propose parallel solutions |
| Migration | "How would you migrate from Postgres to BigQuery?" | Phased approach, risk mitigation |
Interview Tips
Structure Your Answer
1. Clarify (2 min):
"Let me ask a few clarifying questions..."
- Scope (how much data?)
- Latency (how fast?)
- Growth (when?)
- Team (how many engineers?)
2. Identify constraints:
"The constraint I see is: ..."
3. Propose architecture:
"Here's what I'd do: [diagram]"
4. Justify:
"Because: [trade-offs]"
5. Anticipate problems:
"Potential issues: [solutions]"
Phrases to Use
✅ "That's a good point. Let me reconsider..."
✅ "Let me trade off the pros and cons..."
✅ "In my experience, we faced this when..."
✅ "I need more info on X before deciding..."
❌ "I'm not sure..." (instead say "Let me think..." or propose alternatives)
❌ "That's how we always did it" (need better reasoning)
When You Don't Know
Good response:
"I haven't worked with Cassandra in depth, but based on
its wide-column design, I'd expect ... [apply knowledge
of similar systems]. Here's what I'd investigate..."
Bad response:
"I don't know."
Final Checklist: Before Your Interview
✅ Understand medallion layers and why they matter
✅ Know dimensional modeling (star schema, SCD Type 2)
✅ Explain OLTP vs. OLAP and when to use each
✅ Be familiar with 3+ tools (BigQuery, Snowflake, PostgreSQL at minimum)
✅ Understand ETL vs. ELT trade-offs
✅ Know partitioning & clustering optimization strategies
✅ Speak to governance (RACI, quality, lineage)
✅ Articulate trade-offs with numbers (cost, latency, complexity)
✅ Be ready to ask clarifying questions before designing
✅ Practice whiteboarding (draw diagrams, label components)
Example Answers to Common Questions
Q1: "Design a data warehouse for an e-commerce company"
Clarify:
- Scale? 100K orders/day
- Growth? 50% YoY
- Latency? Daily reports (batch okay)
- Team? 5 data engineers, 20 analysts
Answer:
├─ Operational: PostgreSQL (OLTP)
│ └─ Live order processing
│
├─ ELT Pipeline: dbt + Airflow
│ ├─ Bronze: Raw postgres dumps
│ ├─ Silver: Cleaned, deduplicated
│ └─ Gold: Fact_sales + dimensions
│
├─ Analytics: BigQuery
│ └─ Fact table (100M rows) + 10 dimensions
│ └─ Star schema, SCD Type 2 for customers
│
└─ BI: Looker (dashboard, data model governance)
Cost: ~$5K/month (1TB/month × $6.25 + team)
Timeline: 3 months to production
Trade-off: Extra storage cost (Bronze + Silver) is worth
operational clarity for scaling team.
Key Takeaways
✅ Answer frameworks (clarify → identify → propose → justify)
✅ Trade-off thinking (nothing is free)
✅ Ask good questions (shows maturity)
✅ Use ADRs to document decisions
✅ Be concrete (numbers, examples, not theory only)
✅ Admit unknowns (but show reasoning)
✅ Practice whiteboarding (communication is key)
Good luck with your interviews! 🚀