Production Considerations
Production readiness is not a single feature.
It is a consistent operating model across reliability, security, cost, and governance.
The current code is useful as a baseline because it shows the main moving parts clearly, but it also reveals where production safeguards are still missing: auth is a placeholder, routing is hard-coded, memory is transient, and traces are printed.
Readiness flow
flowchart TB
D[Design Review] --> T[Load and Failure Tests]
T --> S[Security Controls]
S --> O[Observability Baseline]
O --> R[Release Gates]
R --> P[Post-Release Monitoring]
style D fill:#1976d2,color:#fff
style T fill:#1976d2,color:#fff
style S fill:#ff9800,color:#fff
style O fill:#ff9800,color:#fff
style R fill:#1976d2,color:#fff
style P fill:#ff9800,color:#fff
Production checklist
| Domain | Must-have controls | Practical baseline |
|---|---|---|
| Reliability | Timeouts, retries, fallback responses | Track P95 latency and tool error rate |
| Security | AuthN, AuthZ, secret hygiene | Separate read-only and write-capable tools |
| Cost | Token and tool usage budgeting | Add request-level cost attribution |
| Governance | Audit trail and release process | Versioned prompts and tool contracts |
Production controls mapped to the repository
| Control | Applied to | Why it matters |
|---|---|---|
| Request validation | src/api/server.py |
Prevents malformed prompts from entering the pipeline |
| Tool allowlist | src/llm/router.py |
Keeps the router from calling unsafe tools |
| Persistent storage | src/memory/store.py |
Preserves memory across process restarts |
| Structured telemetry | src/observability/tracer.py |
Makes operational analysis searchable and machine-readable |
| Authentication | src/security/auth.py |
Protects API entry points and privileged tools |
Capacity planning formula
A simple planning formula estimates required workers under expected load.
| Symbol | Meaning |
|---|---|
W |
Minimum worker count |
Q |
Target requests per second |
L |
Average processing latency in seconds |
U |
Desired utilization per worker |
Example: for Q=20, L=0.3, and U=0.6, you need at least 10 workers to stay within target utilization.
What is the most common production mistake?
Teams launch without route-level and tool-level telemetry. Without this, failures appear as generic model errors and are hard to isolate.
How do we roll out safely?
Use staged releases with shadow traffic and automated rollback triggers. Promote only after SLO and error budget checks remain stable.
What should be productionized first in this codebase?
Add structured observability and real authentication before expanding routing logic. Those two changes reduce operational and security risk immediately.
--8<-- "_abbreviations.md"