API Gateway — Deep Dive
Level: Intermediate
Pre-reading: 05 · API & Communication
What is an API Gateway?
An API Gateway is a single entry point for all client requests. It handles cross-cutting concerns and routes requests to backend services.
graph TD
M[Mobile] --> GW[API Gateway]
W[Web] --> GW
P[Partner] --> GW
GW --> O[Order Service]
GW --> U[User Service]
GW --> PAY[Payment Service]
Core Responsibilities
| Responsibility | Description |
|---|---|
| Routing | Direct requests to appropriate backend services |
| Authentication | Validate JWT/API keys before forwarding |
| Authorization | Check permissions for routes |
| Rate Limiting | Throttle per client/IP |
| SSL Termination | Handle TLS; internal traffic can be plain HTTP |
| Load Balancing | Distribute across service instances |
| Caching | Cache GET responses |
| Request/Response Transform | Modify headers, body shape |
| Aggregation | Combine multiple service responses |
| Observability | Centralized logging and metrics |
Routing Patterns
Path-Based Routing
routes:
- path: /api/orders/**
service: order-service
- path: /api/users/**
service: user-service
- path: /api/products/**
service: product-service
Header-Based Routing
routes:
- headers:
X-API-Version: v2
service: order-service-v2
- headers:
X-API-Version: v1
service: order-service-v1
Canary Routing
routes:
- path: /api/orders/**
targets:
- service: order-service-v1
weight: 90
- service: order-service-v2
weight: 10
Authentication at the Gateway
Validate tokens centrally; forward validated claims to services.
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant AS as Auth Server
participant S as Service
C->>GW: Request + JWT
GW->>GW: Validate JWT signature
GW->>GW: Check expiry, issuer, audience
alt Invalid
GW->>C: 401 Unauthorized
else Valid
GW->>S: Request + X-User-Id header
S->>GW: Response
GW->>C: Response
end
Gateway JWT Configuration (Kong)
plugins:
- name: jwt
config:
claims_to_verify:
- exp
key_claim_name: kid
secret_is_base64: false
- name: request-transformer
config:
add:
headers:
- X-User-Id:$(jwt.claims.sub)
Rate Limiting
Protect backend services from overload.
Rate Limiting Strategies
| Strategy | Description |
|---|---|
| Per client | Limit by API key or user ID |
| Per IP | Limit by source IP |
| Per route | Different limits per endpoint |
| Global | Total gateway throughput |
Rate Limiting Response
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699900000
Configuration (Kong)
Request/Response Transformation
Modify requests before forwarding or responses before returning.
Use Cases
| Transformation | Example |
|---|---|
| Add headers | Add correlation ID, user context |
| Remove headers | Strip internal headers from response |
| Rename fields | Map external field names to internal |
| Filter response | Remove sensitive fields |
Example (Kong)
plugins:
- name: request-transformer
config:
add:
headers:
- X-Request-ID:$(uuid)
- X-Forwarded-Host:$(host)
remove:
headers:
- X-Internal-Header
Response Caching
Cache GET responses to reduce backend load.
plugins:
- name: proxy-cache
config:
strategy: memory
content_type:
- application/json
cache_ttl: 300
cache_control: true
Cache Considerations
| Consideration | Recommendation |
|---|---|
| TTL | Short for dynamic data; longer for static |
| Invalidation | Use cache headers or explicit purge |
| Vary headers | Cache per user if response varies |
| Storage | Redis for distributed; memory for single instance |
Gateway Aggregation
Combine responses from multiple services into one response.
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant OS as Order Service
participant US as User Service
participant PS as Product Service
C->>GW: GET /dashboard
par Parallel
GW->>OS: GET /orders
GW->>US: GET /profile
GW->>PS: GET /recommendations
end
OS->>GW: Orders
US->>GW: Profile
PS->>GW: Recommendations
GW->>GW: Aggregate
GW->>C: Combined dashboard
Aggregation Trade-offs
| Benefit | Cost |
|---|---|
| Single client request | Gateway complexity |
| Reduced latency (parallel calls) | Partial failure handling |
| Tailored response | Business logic at gateway edge |
API Gateway Tools
| Tool | Type | Key Features |
|---|---|---|
| Kong | OSS + Enterprise | Plugin ecosystem, declarative config |
| AWS API Gateway | Managed | Lambda integration, pay-per-request |
| Spring Cloud Gateway | OSS | Java-native, reactive, WebFlux |
| Envoy | OSS | High-performance, service mesh ready |
| NGINX | OSS + Commercial | Proven, lightweight |
| Traefik | OSS | K8s-native, auto-discovery |
| Apigee | Commercial | Full API management platform |
Selection Criteria
| Factor | Consideration |
|---|---|
| Cloud native | AWS API Gateway, GCP Cloud Endpoints |
| Kubernetes | Kong, Traefik, Ambassador |
| Java ecosystem | Spring Cloud Gateway |
| Performance critical | Envoy, NGINX |
| Full API management | Kong Enterprise, Apigee |
Gateway Deployment Patterns
Single Gateway
graph TD
C[Clients] --> GW[API Gateway]
GW --> S1[Service 1]
GW --> S2[Service 2]
GW --> S3[Service 3]
Simple; single point of failure.
Regional Gateways
graph TD
subgraph US Region
C1[Clients] --> GW1[Gateway US]
end
subgraph EU Region
C2[Clients] --> GW2[Gateway EU]
end
GW1 --> Services
GW2 --> Services
Lower latency; regional compliance.
Gateway per Domain
graph TD
C[Clients] --> R[Router/LB]
R --> GW1[Orders Gateway]
R --> GW2[Users Gateway]
R --> GW3[Products Gateway]
GW1 --> OS[Order Services]
GW2 --> US[User Services]
GW3 --> PS[Product Services]
Domain team ownership; avoid monolithic gateway.
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Business logic in gateway | Gateway becomes monolith | Keep gateway thin |
| Coupling to implementation | Changes require gateway changes | Route by contract, not implementation |
| No rate limiting | DoS vulnerability | Always configure limits |
| No timeout | Stuck connections | Set upstream timeouts |
| Monolithic gateway config | Hard to manage | Modular configs per team |
Gateway vs Service Mesh
| Aspect | API Gateway | Service Mesh |
|---|---|---|
| Traffic type | North-south (external → cluster) | East-west (service → service) |
| Deployment | Edge of cluster | Sidecar per pod |
| Scope | External API concerns | Internal communication |
| Examples | Kong, AWS API Gateway | Istio, Linkerd |
Often used together: Gateway at edge, mesh for internal.
When should you use an API Gateway vs direct service calls?
Use a gateway when: (1) You need centralized auth, rate limiting, or logging. (2) Multiple clients with different needs. (3) You want to decouple clients from backend topology. Direct calls are fine for internal service-to-service (use service mesh instead).
How do you prevent the API Gateway from becoming a monolith?
(1) Keep business logic in services, not gateway. (2) Use gateway only for cross-cutting concerns. (3) Consider BFF pattern — gateway per client type. (4) Modular config — each team manages their routes. (5) Avoid aggregation that requires domain knowledge.
What happens when the API Gateway fails?
The gateway is a single point of failure. Mitigations: (1) High availability — multiple instances behind load balancer. (2) Health checks — remove unhealthy instances. (3) Circuit breaker — fail fast if backend down. (4) Regional failover — route to another region. (5) Client retry — with backoff.