Production Considerations

Production readiness is not a single feature.

It is a consistent operating model across reliability, security, cost, and governance.

The current code is useful as a baseline because it shows the main moving parts clearly, but it also reveals where production safeguards are still missing: auth is a placeholder, routing is hard-coded, memory is transient, and traces are printed.

Readiness flow

flowchart TB
    D[Design Review] --> T[Load and Failure Tests]
    T --> S[Security Controls]
    S --> O[Observability Baseline]
    O --> R[Release Gates]
    R --> P[Post-Release Monitoring]

    style D fill:#1976d2,color:#fff
    style T fill:#1976d2,color:#fff
    style S fill:#ff9800,color:#fff
    style O fill:#ff9800,color:#fff
    style R fill:#1976d2,color:#fff
    style P fill:#ff9800,color:#fff

Production checklist

Domain	Must-have controls	Practical baseline
Reliability	Timeouts, retries, fallback responses	Track P95 latency and tool error rate
Security	AuthN, AuthZ, secret hygiene	Separate read-only and write-capable tools
Cost	Token and tool usage budgeting	Add request-level cost attribution
Governance	Audit trail and release process	Versioned prompts and tool contracts

Production controls mapped to the repository

Control	Applied to	Why it matters
Request validation	`src/api/server.py`	Prevents malformed prompts from entering the pipeline
Tool allowlist	`src/llm/router.py`	Keeps the router from calling unsafe tools
Persistent storage	`src/memory/store.py`	Preserves memory across process restarts
Structured telemetry	`src/observability/tracer.py`	Makes operational analysis searchable and machine-readable
Authentication	`src/security/auth.py`	Protects API entry points and privileged tools

Capacity planning formula

A simple planning formula estimates required workers under expected load.

\[ W = \frac{Q \cdot L}{U} \]

Symbol	Meaning
`W`	Minimum worker count
`Q`	Target requests per second
`L`	Average processing latency in seconds
`U`	Desired utilization per worker

Example: for Q=20, L=0.3, and U=0.6, you need at least 10 workers to stay within target utilization.

What is the most common production mistake?

Teams launch without route-level and tool-level telemetry. Without this, failures appear as generic model errors and are hard to isolate.

How do we roll out safely?

Use staged releases with shadow traffic and automated rollback triggers. Promote only after SLO and error budget checks remain stable.

What should be productionized first in this codebase?

Add structured observability and real authentication before expanding routing logic. Those two changes reduce operational and security risk immediately.

--8<-- "_abbreviations.md"