Service Mesh — Deep Dive

Level: Advanced
Pre-reading: 05 · API & Communication · 09 · Deployment & Infrastructure

What is a Service Mesh?

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It uses sidecar proxies deployed alongside each service.

graph TD
    subgraph Control Plane
        CP[Istiod]
    end
    subgraph Data Plane
        subgraph Pod A
            A[App A]
            EA[Envoy]
        end
        subgraph Pod B
            B[App B]
            EB[Envoy]
        end
    end
    CP -->|Config| EA
    CP -->|Config| EB
    A --> EA
    EA -->|mTLS| EB
    EB --> B

Why Service Mesh?

Without Mesh	With Mesh
Each service implements retries	Mesh handles retries
Each language needs its own circuit breaker library	Consistent across languages
mTLS implemented per service	Automatic mTLS
Distributed tracing requires code changes	Automatic trace propagation
Traffic management requires code	Declarative traffic rules

Service Mesh Components

Data Plane

The data plane is the collection of sidecar proxies that intercept all network traffic.

Component	Description
Sidecar proxy	Envoy, Linkerd proxy
Traffic interception	iptables rules redirect traffic
Protocol handling	HTTP/1.1, HTTP/2, gRPC, TCP

Control Plane

The control plane manages and configures the proxies.

Component	Description
Configuration API	VirtualService, DestinationRule
Service discovery	Kubernetes API, Consul
Certificate authority	Issues mTLS certificates
Telemetry collection	Aggregates metrics, traces

Istio Architecture

graph TD
    subgraph Control Plane
        I[Istiod]
        I --> CA[Certificate Authority]
        I --> C[Config Management]
        I --> D[Discovery]
    end
    subgraph Data Plane
        E1[Envoy]
        E2[Envoy]
        E3[Envoy]
    end
    I -->|xDS| E1
    I -->|xDS| E2
    I -->|xDS| E3
    E1 <-->|mTLS| E2
    E2 <-->|mTLS| E3

Istio Component	Purpose
Istiod	Unified control plane (Pilot, Citadel, Galley)
Envoy	Sidecar proxy
Gateway	Ingress/egress traffic

Traffic Management

VirtualService

Define routing rules for traffic.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: orders
spec:
  hosts:
    - order-service
  http:
    - match:
        - headers:
            x-canary:
              exact: "true"
      route:
        - destination:
            host: order-service
            subset: v2
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10

DestinationRule

Define policies for traffic to a destination.

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: orders
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
    loadBalancer:
      simple: LEAST_CONN
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 60s
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Resilience Features

Circuit Breaking

trafficPolicy:
  outlierDetection:
    consecutive5xxErrors: 5
    interval: 10s
    baseEjectionTime: 30s
    maxEjectionPercent: 50

Retries

http:
  - route:
      - destination:
          host: order-service
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: 5xx,reset,connect-failure

Timeouts

http:
  - route:
      - destination:
          host: order-service
    timeout: 10s

Security: mTLS

Service mesh provides mutual TLS automatically.

sequenceDiagram
    participant A as Service A
    participant EA as Envoy A
    participant EB as Envoy B
    participant B as Service B

    A->>EA: HTTP request
    EA->>EB: mTLS encrypted
    EB->>B: HTTP request
    B->>EB: Response
    EB->>EA: mTLS encrypted
    EA->>A: Response

PeerAuthentication

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT  # Require mTLS for all services

AuthorizationPolicy

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: order-service-policy
spec:
  selector:
    matchLabels:
      app: order-service
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/production/sa/payment-service"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/api/orders/*"]

Observability

Service mesh provides observability without code changes.

Metrics (Prometheus)

# Istio exports these automatically
istio_requests_total{...}
istio_request_duration_milliseconds{...}
istio_tcp_connections_opened_total{...}

Distributed Tracing

graph LR
    A[Service A] -->|span| B[Service B]
    B -->|span| C[Service C]
    A -->|trace| J[Jaeger]
    B -->|trace| J
    C -->|trace| J

Envoy propagates trace headers automatically (B3, W3C TraceContext).

Access Logs

{
  "protocol": "HTTP/2",
  "upstream_service": "order-service",
  "response_code": 200,
  "response_flags": "-",
  "duration": 45,
  "request_id": "abc-123"
}

Service Mesh Options

Mesh	Proxy	Key Features
Istio	Envoy	Full-featured; complex
Linkerd	Linkerd2-proxy	Lightweight; simpler
Consul Connect	Envoy	HashiCorp ecosystem
AWS App Mesh	Envoy	AWS-native
Kuma	Envoy	CNCF; multi-cluster

Selection Criteria

Factor	Recommendation
Simplicity	Linkerd
Features	Istio
AWS native	App Mesh
Multi-cloud	Istio, Kuma
Existing HashiCorp	Consul Connect

When to Use Service Mesh

Good Fit

Scenario	Why Mesh Helps
10+ microservices	Consistent policies at scale
Polyglot services	Language-agnostic features
Zero-trust security	Automatic mTLS
Complex traffic management	Canary, A/B, fault injection
Observability gaps	Automatic metrics and traces

Poor Fit

Scenario	Why Not
< 5 services	Overhead not justified
Simple routing	K8s Services suffice
Resource constrained	Sidecar overhead
Team unfamiliar	Learning curve

Mesh Overhead

Resource	Per Pod	Notes
Memory	50-100 MB	Envoy sidecar
CPU	0.1-0.2 cores	Processing traffic
Latency	1-3 ms	Additional hop

Anti-Patterns

Anti-Pattern	Problem	Fix
Mesh for everything	Overhead on simple apps	Use mesh where needed
Ignoring sidecar health	App works, mesh doesn't	Include sidecar in health checks
Complex routing logic	Hard to understand	Keep routing simple
No gradual rollout	Breaking changes	Canary the mesh itself

When should you use a service mesh vs implementing resilience in the application?

Use a service mesh when: (1) You have many services in different languages. (2) You need consistent mTLS. (3) You want observability without code changes. Use application libraries (Resilience4j) when: (1) Homogeneous stack. (2) Fine-grained control needed. (3) Sidecar overhead unacceptable.

What's the difference between Istio and Linkerd?

Istio is feature-rich (traffic management, security, observability) but complex. Uses Envoy, resource-heavy. Linkerd is lightweight, simpler, focuses on reliability. Custom Rust proxy, lower overhead. Choose Istio for features; Linkerd for simplicity.

How does mTLS work in a service mesh?

(1) Control plane acts as Certificate Authority — issues short-lived certs to each workload. (2) Certs are rotated automatically (every 24h typically). (3) Sidecars intercept traffic and establish mTLS connections. (4) PeerAuthentication policy enforces mTLS mode (STRICT, PERMISSIVE). (5) AuthorizationPolicy controls which services can communicate.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search