01: DevOps Fundamentals¶
Definition¶
DevOps = Development + Operations, a philosophy and set of practices that bridges developers (who build features) and operations (who run systems), aiming to:
- Faster, reliable software delivery
- Shared responsibility for quality
- Continuous improvement through automation
Core DevOps Principles¶
1. Culture: Breaking Down Silos¶
DevOps isn't just tools—it's a mindset. Historically:
Traditional Setup:
Developers → "It works on my machine"
↓
[Throw it over the wall]
↓
Operations → "I don't know what to run"
↓
❌ Blame, slow releases, firefighting
DevOps approach:
DevOps Mindset:
Developers ↔ Operations
↓
Shared goals: Fast, reliable, secure releases
↓
Shared tools, shared dashboards, shared on-call
↓
✅ Collaboration, automation, continuous improvement
2. Key Principles (CALMS)¶
| Principle | Meaning | Example |
|---|---|---|
| Culture | Shared responsibility, collaboration | Dev + Ops work together, no blame |
| Automation | Remove manual, repetitive work | CI/CD pipelines, infrastructure as code |
| Lean | Eliminate waste, focus on value | Small batches, fast feedback |
| Measurement | Data-driven decisions | Metrics, dashboards, alerting |
| Sharing | Transparency, knowledge sharing | Runbooks, documentation, postmortems |
The DevOps Value Stream¶
graph LR
A["Developer<br/>Writes Code"] -->|Git Commit| B["CI Pipeline<br/>Build + Test"]
B -->|Tests Pass| C["Build Artifact<br/>Docker Image"]
C -->|Approved| D["Deploy to<br/>Staging"]
D -->|Smoke Tests| E["Deploy to<br/>Production"]
E -->|Running| F["Monitor &<br/>Alert"]
F -->|Issue Found| G["Incident<br/>Response"]
G -->|Fix Deployed| A
The goal: Reduce the time from idea → production while keeping quality high.
Key DevOps Practices¶
1. Infrastructure as Code (IaC)¶
Instead of clicking buttons in AWS/GCP, write code that describes infrastructure:
# Terraform example
resource "kubernetes" "cluster" {
name = "production"
nodes = 5
cpu = 4
memory = 8
}
Benefits:
- ✅ Version controlled (Git history)
- ✅ Reproducible (same every time)
- ✅ Testable (validate before apply)
- ✅ Documentable (code is documentation)
2. Continuous Integration (CI)¶
Definition: Every code commit is automatically:
- Built
- Tested
- Analyzed for quality
- Packaged (as a Docker image)
Developer pushes code
↓
Webhook triggers CI pipeline
↓
Compile, run tests, static analysis
↓
If all pass: build Docker image
↓
If any fail: alert developer
Benefits:
- 🔍 Catch bugs early
- 📊 Consistent build quality
- ✅ Automated quality gates
- 🚀 Always-ready artifacts
3. Continuous Delivery / Deployment (CD)¶
Continuous Delivery: Ready-to-deploy artifacts (can be manual approval)
Continuous Deployment: Automatically deploy to production (no manual approval)
CI Pipeline (Continuous Integration)
↓
Artifact Ready (Docker image)
↓
Deploy to Staging (Continuous Delivery)
↓ [Optional: Manual Approval]
↓
Deploy to Production (Continuous Deployment)
4. Monitoring & Observability¶
What to measure:
- Metrics: CPU, memory, request latency, error rate
- Logs: Application and system events
- Traces: Request path through microservices
- Alerts: Trigger actions when thresholds exceeded
Production Application
↓
Prometheus (collects metrics)
↓
Grafana (visualizes dashboards)
↓
AlertManager (triggers alerts)
↓
Slack/PagerDuty (notifies team)
5. Incident Response & Postmortems¶
When things break:
- Alert → Team gets notified
- Respond → Mitigate impact immediately
- Investigate → Find root cause
- Postmortem → Learn and improve
- Automate → Prevent recurrence
DevOps Tools Landscape¶
No tool is "DevOps"—the mindset is DevOps. Tools are enablers.
CI/CD Orchestration¶
- Jenkins (on-premises)
- GitHub Actions (GitHub-native)
- GitLab CI (GitLab-native)
- Gitea (self-hosted)
Container & Orchestration¶
- Docker (containerization)
- Kubernetes (K8s orchestration)
- Minikube / kind (local K8s)
Infrastructure as Code¶
- Terraform (cloud-agnostic)
- CloudFormation (AWS-specific)
- Ansible (configuration management)
Package Management¶
- Helm (Kubernetes packages)
- Apt/Yum (Linux packages)
GitOps (Configuration Automation)¶
- Flux CD
- ArgoCD
- Spinnaker
Observability¶
- Prometheus (metrics)
- Grafana (dashboards)
- Loki (logs)
- Jaeger (tracing)
- ELK Stack (Elasticsearch, Logstash, Kibana)
The Three Ways of DevOps¶
The First Way: Flow¶
Maximize flow from development to production. Work in small batches, optimize for speed.
Example: Pull requests → automated testing → automated deployment
The Second Way: Feedback¶
Amplify and shorten feedback loops so problems are caught early.
Example: Monitoring → alerts → dashboards → postmortems
The Third Way: Experimentation¶
Encourage risk-taking, learning, and continuous improvement.
Example: Blameless postmortems, chaos engineering, feature flags
Common DevOps Roles¶
| Role | Focus | Example Tasks |
|---|---|---|
| Platform Engineer | Build tools & infrastructure | Terraform, Kubernetes, CI/CD pipelines |
| DevOps Engineer | Ops + development | Monitoring, automation, incident response |
| SRE (Site Reliability Engineer) | Reliability & performance | Monitoring, capacity planning, postmortems |
| Cloud Engineer | Cloud infrastructure | AWS/GCP/Azure, IAM, security |
Why DevOps Matters (Business Value)¶
Speed¶
- ⚡ Before: 6-month releases, weeks to deploy a fix
- ⚡ After: Daily/hourly releases, minutes to fix
Reliability¶
- 🛡️ Before: 10 nines of uptime (frequent outages)
- 🛡️ After: Production deployments with 99.99%+ uptime
Cost¶
- 💰 Before: Over-provisioned servers, waste
- 💰 After: Auto-scaling, only pay for what you use
Morale¶
- 😊 Before: Developers blame ops, ops blame developers
- 😊 After: Shared ownership, shared success
Anti-Patterns to Avoid¶
❌ Tool-Centric DevOps¶
"We bought Jenkins, so we're DevOps!"
Fix: Focus on culture and practices first, tools second
❌ DevOps Team Silos¶
Hiring a "DevOps team" but keeping them separate from developers
Fix: Embed DevOps engineers with product teams
❌ No Monitoring¶
"If nobody's checking, there's no problem"
Fix: Instrument everything, set up meaningful alerts
❌ Blame Culture¶
Blaming individuals when things break
Fix: Blameless postmortems, focus on systems improvement
❌ Manual Runbooks¶
"Here's 50 manual steps to deploy"
Fix: Automate repetitive tasks
Interview Questions¶
Q: What's the difference between DevOps and SRE?
- DevOps is a culture/practice of shared responsibility
- SRE (Site Reliability Engineering) is a specific job role that operationalizes DevOps with focus on reliability metrics (SLOs, error budgets)
Q: Why is immutable infrastructure important?
- Immutable = once deployed, never changed (always redeploy)
- Benefits: Consistent state, easier rollbacks, less drift
Q: What's the difference between CI and CD?
- CI (Continuous Integration) = build + test every commit
- CD (Continuous Delivery) = always ready to deploy (manual approval possible)
- CD (Continuous Deployment) = automatically deployed to production
Key Takeaways¶
✅ DevOps is a mindset, not a tool or job title
✅ Automation breaks down silos between dev and ops
✅ Measure everything — you can't improve what you don't measure
✅ Blameless postmortems lead to system improvements
✅ Small batches and fast feedback reduce risk
✅ Infrastructure as code makes systems reproducible and testable
Next Steps¶
- Read: Theory 02: Containerization — Learn how Docker enables DevOps
- Do: Lab 00: Environment Setup — Get your tools ready
- Explore: Your organization's DevOps practices (Slack, builds, deployments)
References¶
- The Phoenix Project (Book) — Explains DevOps narrative
- DevOps Handbook — Practical guide to implementing DevOps
- https://devops.com/ — DevOps news and resources