Interview Preparation Guide

Overview

Preparing for a DevOps role? This guide is organized by difficulty tier (Tier 1→3) and topic area, progressing from fundamentals to advanced concepts.

Quick navigation:

Tier 1 (Junior): Docker, Kubernetes fundamentals, Services
Tier 2 (Mid-level): Helm, advanced K8s workloads, observability
Tier 3 (Senior): GitOps, multi-region, security, advanced patterns

Tier 1: Fundamentals (Junior Level)

Container Basics (Docker)

Q1.1: What is a Docker image vs. a container?

Image: A blueprint/template (immutable), like a class definition
Container: A running instance of an image (mutable), like an object

You can have one image and run 100 containers from it.

Q1.2: What does docker build do and what's the output?

Builds a Docker image from a Dockerfile. Process:

Reads Dockerfile instructions
Creates layers (for caching)
Stores image locally

Output: Docker image (tagged, e.g., myapp:1.0.0)

Q1.3: Explain Dockerfile layers and why caching matters

Each command in Dockerfile = one layer. Docker caches layers.

Example:

Layer 1: FROM python:3.11 (200MB, cached)
Layer 2: COPY requirements.txt (1MB, cached)
Layer 3: RUN pip install (cached)
Layer 4: COPY mycode.py (code changed, re-runs)
Layer 5-end: Re-run

Why it matters: Only changed layers rebuild. Saves time.

Q1.4: What's a multi-stage build and why use it?

Separate build stage (includes build tools) from runtime stage (minimal). Reduces final image size (e.g., 1GB → 50MB).

Example: - Build stage: Includes compiler, dependencies, build tools - Runtime stage: Only binary/app code, minimal OS

Result: Smaller images, faster deployment, less attack surface.

Kubernetes Core Concepts

Q1.5: What is a Pod?

Smallest unit in Kubernetes. Can contain 1+ containers (usually 1).

Containers in a pod:

Share network namespace (same IP)
Share storage volumes
Can communicate via localhost

Pods are ephemeral (transient). Don't create directly; use Deployments.

Q1.6: What's the difference between a Pod and a Deployment?

Pod: Single instance (ephemeral)
Deployment: Describes desired state (manage 1+ pods)

Deployment handles:

Replication (run N copies)
Restarting crashed pods
Rolling updates
Scaling

Best practice: Always use Deployment, never create Pods directly.

Q1.7: Explain Kubernetes control plane components

API Server: REST interface for managing resources
etcd: Distributed key-value store (cluster state)
Scheduler: Assigns pods to nodes
Controller Manager: Enforces desired state (runs controllers for Deployment, StatefulSet, etc.)
kubelet: Runs on each node, manages pod lifecycle
kube-proxy: Network routing on nodes

Together they maintain cluster state and reconcile reality to desired state.

Q1.8: What does kubectl apply do?

Applies a manifest (YAML) to cluster.

If resource doesn't exist → creates it
If resource exists → updates it
If resource removed from manifest → deleted next time

Idempotent: Safe to run multiple times.

Kubernetes Networking

Q1.9: What are the three Service types?

ClusterIP (default): Internal communication only. DNS name: service-name.namespace.svc.cluster.local
NodePort: Expose on every node's IP:port. Accessible from outside cluster.
LoadBalancer: Cloud provider load balancer. Assigns external IP.

When to use:

ClusterIP: Pod-to-pod communication
NodePort: External access without cloud LB
LoadBalancer: Production external access

Troubleshooting Basics

Q1.10: How do you debug a CrashLoopBackOff pod?

kubectl logs pod-name — Check application logs
kubectl describe pod pod-name — See events and exit code
kubectl get events — Check cluster events
Verify image exists, resource limits not exceeded, dependencies available

Common causes: Bad config, missing dependencies, OOM, permission issues.

Tier 2: Intermediate (Mid-Level)

Advanced Kubernetes Workloads

Q2.1: What's the difference between Deployment, StatefulSet, and DaemonSet?

Type	Use	Pods
Deployment	Stateless (APIs, web)	Interchangeable, random names
StatefulSet	Stateful (databases)	Unique identity (pod-0, pod-1), persistent storage
DaemonSet	Node-level (logging)	One per node, always

When to use:

Deployment: 99% of apps
StatefulSet: PostgreSQL, MongoDB, Redis
DaemonSet: Filebeat, node exporter, CNI

Q2.2: How does a rolling update work?

Kubernetes gradually replaces old pods with new ones:

Old version: 3 pods running v1.0
Spin up 1 pod v2.0 (4 total)
Remove 1 v1.0 (3 total)
Repeat until all v2.0

Benefits: Zero downtime, automatic rollback if health checks fail

Controls: maxSurge, maxUnavailable

Kubernetes Storage & Configuration

Q2.3: What's a Persistent Volume and why do you need it?

Problem: Container storage is ephemeral. Pod restarts = data lost.

Solution: PersistentVolume + PersistentVolumeClaim

PV = cluster-level storage resource (10GB)
PVC = Pod's request for storage ("I need 5GB")

Pod uses data that survives restarts.

Use cases: Databases, cached data, logs.

Q2.4: How do you pass configuration to containers in Kubernetes?

Three main approaches:

Environment variables: Simple key-value pairs
ConfigMaps for non-sensitive data
Secrets for passwords/tokens (base64 encoded)
Volume mounts: Files/directories
ConfigMap volumes
Secret volumes
Command-line arguments: Passed to container

Best practice: Use ConfigMaps for configs, Secrets for credentials.

Package Management with Helm

Q2.5: What is Helm and what problem does it solve?

Helm = "package manager for Kubernetes"

Solves:

Duplication: Templating (values parameterize manifests)
Versioning: Release management
Dependencies: Chart dependencies
Upgrades: Easy updates + rollbacks

Example: Instead of 10 YAML files, one Helm chart with values.

Q2.6: Explain helm install vs. helm upgrade --install

helm install: Creates new release. Fails if release already exists.
helm upgrade --install: Creates if missing, updates if exists.

Best practice: Use --install for idempotency.

Q2.7: How do you manage different environments (dev, staging, prod) with Helm?

Use multiple values files:

# Dev: 1 replica, small resources
helm install myapp ./chart -f values-dev.yaml

# Prod: 5 replicas, large resources
helm install myapp ./chart -f values-prod.yaml

Each file overrides defaults in values.yaml.

Observability & Monitoring

Q2.8: What are the three pillars of observability?

Metrics: Numeric measurements (CPU, latency, error rate)
Tool: Prometheus
Logs: Discrete events with context
Tool: Loki
Traces: Request path across services
Tool: Jaeger

Together → complete visibility. Separately → blind spots.

Q2.9: How do you debug high latency in a Kubernetes service?

Check metrics: Prometheus latency queries
Check logs: Pod logs for errors
Check traces: Jaeger for service breakdown
Check network: Network policies, node CPU/memory
Check application: Profiling, database queries

Tier 3: Advanced (Senior Level)

GitOps & Deployment Strategies

Q3.1: What is GitOps and how does Flux differ from Jenkins?

GitOps: Git = single source of truth. Automated agents reconcile cluster to match Git.

Aspect	Jenkins (Push)	Flux (Pull)
Trigger	Pipeline runs `helm upgrade`	Flux watches Git
Drift	No detection	Auto-corrects
Audit	Pipeline logs	Git commit history
Rollback	Rerun pipeline	`git revert`

Flux benefits: Better drift detection, audit trail, easier rollback.

Q3.2: How do you implement multi-cluster GitOps?

One Git repo, separate directories per cluster:

config-repo/
├── clusters/east/
│   └── releases.yaml  (3 replicas)
├── clusters/west/
│   └── releases.yaml  (5 replicas)
└── apps/
    └── api-chart/

Each cluster bootstraps Flux to its directory:

flux bootstrap github --path=./clusters/east
flux bootstrap github --path=./clusters/west

Both sync independently from same repo.

Q3.3: Describe a canary deployment strategy

Gradually shift traffic to new version:

0%    10%    50%    100%
v1    v1/v2  v1/v2  v2
(5min)(10min)(15min)done

At each step:

Monitor error rate, latency
If issues → rollback
If healthy → continue

Tools: Flagger + service mesh, or manual with traffic split.

Multi-Region & Advanced Architecture

Q3.4: How would you design a multi-region Kubernetes deployment?

Architecture:

Global Load Balancer (Routes by geography/health)
    ├─ East Region (K8s Cluster)
    │  ├─ Data replicated (PostgreSQL read replicas)
    │  └─ Cache (Redis)
    │
    └─ West Region (K8s Cluster)
       ├─ Data replicated
       └─ Cache (Redis)

Considerations:

Data: Replicate database, ensure consistency
Networking: Cross-region latency, cost
Failover: Automatic traffic reroute
Cost: Duplicated infrastructure
Testing: Failure scenarios

Q3.5: What's a sidecar and when do you use it?

Sidecar = additional container in pod (shares network, storage)

Use cases:

Logging: Pod writes logs, sidecar ships to Loki
Metrics: Sidecar exposes metrics endpoint
Network proxy: Envoy intercepts traffic (encryption, rate limit, retries)
Security: Sidecar enforces auth/encryption

Benefit: Extend functionality without changing app code.

Security & Access Control

Q3.6: How do you manage secrets in Kubernetes securely?

Options (from basic to advanced):

K8s Secrets: Base64 encoded (not encrypted by default) — OK for dev
Encryption at rest: Enable in etcd (API server config)
Sealed Secrets: Encrypt secrets using cluster public key
External Vault: HashiCorp Vault (external secret management)
Secret operator: Automatically rotates/syncs secrets

Best practice: Use sealed secrets or external vault for production.

Q3.7: What's Network Policy and how do you implement zero-trust?

Network Policy: Restrict pod-to-pod communication (like firewall)

# Deny all traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  # ingress/egress: [] (empty = no traffic allowed)

---
# Allow only specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api
spec:
  podSelector:
    matchLabels:
      app: api
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 5000

Zero-trust: Deny all → explicitly allow needed traffic.

Production Operations & Reliability

Q3.8: Debug a production outage: "All requests returning 500 errors"

Immediate (1-2 min):

Rollback recent deployment: helm rollback or git revert
Alert team, update status page
Monitor error rate

Investigate (parallel):

Metrics: Check CPU, memory, network saturation
Logs: Filter errors, check timestamps
Traces: See which service failing
Events: kubectl get events
Network: DNS, connectivity, policies

Root cause (post-incident):

Post-mortem meeting
Identify systemic cause (not just symptom)
Add monitoring/alerting to prevent
Improve runbooks

Q3.9: What's SLA/SLO/SLI and why important?

SLA (Service Level Agreement): Contract with users (99.99% uptime)
SLO (Service Level Objective): Internal goal (99.9% for us)
SLI (Service Level Indicator): Actual measurement (99.87% actual)

Error budget: If SLO=99.9%, you can tolerate 0.1% errors.

Why:

SLOs drive reliability decisions
Error budget prevents "always live" culture
Balances speed and reliability

Practical Scenarios & Architecture Questions

These are common "design" and "troubleshoot" questions that test breadth and problem-solving.

Scenario 1: Design a Production Deployment

Question: Design a Kubernetes + Helm setup for an API serving 10M requests/day

Expected answer should cover:

Compute: Deployment with 3+ replicas, resource requests/limits
Auto-scaling: HPA based on CPU/memory metrics
Networking: Service (ClusterIP), Ingress (external access, TLS)
Configuration: ConfigMaps for settings, Secrets for credentials
Storage: PVC if stateful (database, cache)
Security: NetworkPolicy (restrict traffic), RBAC
Observability: Prometheus metrics, Loki logs, alerting
Deployment: GitOps with Flux or ArgoCD, canary/blue-green strategy
Reliability: Health checks (liveness, readiness), rollback strategy

Scenario 2: Incident Response

Question: Production database is consuming 90% CPU. Users report slow queries.

Your answer should address:

Immediate (1-2 min): Scale up database, enable query logging, rate limit if needed
Root cause (5-10 min): Check slow query log, missing indexes, traffic spike?
Fix (15-30 min): Optimize query, add index, adjust connection pool
Prevention: Add monitoring/alerting before threshold, regular query reviews
Post-mortem: Team learning, documentation, update runbooks

Scenario 3: Cost Optimization

Question: Your Kubernetes bill increased 3x. How do you reduce costs?

Solutions to discuss:

Right-size resource requests/limits (avoid over-provisioning)
Use HPA instead of static scaling
Implement spot instances for non-critical workloads
Consolidate to fewer nodes (bin packing)
Remove unused resources (old deployments, orphaned PVCs)
Use reserved instances for predictable baseline load
Consider managed services vs. self-hosted

Self-Assessment Guide

Use this guide to evaluate your readiness:

Tier	Questions	Expectation	Role Match
Tier 1	Container Basics, K8s Core, Services	Answer 90%+ fluently	Junior/Entry-level
Tier 2	Advanced Workloads, Helm, Observability	Answer 70%+ with thought	Mid-level/Intermediate
Tier 3	GitOps, Security, Multi-region, Operations	Answer 50%+ (harder)	Senior/Staff

How to practice:

Understand the "why" — Memorizing answers doesn't work. Know concepts deeply.
Practice hands-on — Build things, debug real issues, don't just read.
Explain clearly — Pretend you're teaching someone. Clarity matters in interviews.
Ask clarifying questions — "Can you clarify the scale?" shows critical thinking.
Admit gaps confidently — "I haven't worked with that, but I'd approach it by..." is better than guessing.
Time yourself — Practice 1-2 minute answers. No rambling.

Interview Question Types

DevOps interviews typically ask one of these:

1. Concept/Definition

"What is a Pod?" → Straightforward definition. Know these cold.

2. Comparison

"Difference between Deployment and StatefulSet?" → Comparison table in head.

3. Troubleshooting

"Pod is CrashLoopBackOff. How do you debug?" → Show methodology.

4. Design/Architecture

"Design a deployment for X scenario." → Think big picture, security, ops.

5. Decision/Trade-off

"When would you use DaemonSet vs. Deployment?" → Explain trade-offs.

Tip: Listen carefully to the question type. Adjust your answer depth accordingly.

Final Checklist

Before your interview:

✅ Can you explain core concepts in 1-2 min? (Don't ramble)
✅ Do you have hands-on experience with Docker, Kubernetes, Helm?
✅ Can you describe a real incident you handled?
✅ Do you understand trade-offs (cost vs. complexity, speed vs. safety)?
✅ Are you familiar with one observability stack?
✅ Can you describe a production system architecture?

Remember: Interviews test both technical depth AND communication. Be clear, be honest, ask questions.

Good luck! 🚀

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search