09 · Architecture Patterns Reference

Quick reference for the architectural patterns that underpin production AI systems.


System Architecture Patterns

Pattern Description When to Use
Agentic Loop LLM + tools in a perception-plan-act cycle Any multi-step autonomous task
RAG Pipeline Retrieval-augmented generation with vector search Knowledge that changes frequently
Human-in-the-Loop (HITL) Interrupt workflow for human approval Irreversible actions
Plan-and-Execute Separate planning from execution agents Long, structured tasks
Reflection Agent critiques and revises its own output High-stakes generation
Supervisor-Worker Orchestrator delegates to specialist agents Multi-domain pipelines
Event-Driven Agent Agent triggered by webhooks (CI, JIRA) Automation pipelines
Offline RAG Indexing Background process keeps vector index fresh Large, changing codebases

Component Selection Matrix

Need Recommended Component
LLM for code generation Claude 3.5 Sonnet or GPT-4o
LLM for fast/cheap tasks GPT-4o-mini or Claude Haiku
Self-hosted LLM LLaMA 3.3 70B via Ollama or vLLM
Orchestration / graph LangGraph
Java-native AI integration Spring AI
Vector DB (simple) pgvector (Postgres extension)
Vector DB (scale) Weaviate or Qdrant
Code embedding nomic-embed-code
Text embedding text-embedding-3-large (OpenAI)
Reranking Cohere Rerank-3
JIRA integration JIRA MCP Server
GitHub integration GitHub MCP Server
Tracing/observability LangSmith or Langfuse
Guardrails Guardrails AI
PII detection Microsoft Presidio
Secret detection gitleaks or TruffleHog

End-to-End Architecture: JIRA → PR

graph TD
    A[JIRA Webhook] --> B[Agent Service · FastAPI]
    B --> C[LangGraph Workflow]

    subgraph Orchestration
        C --> D[Ticket Reader Node]
        D --> E[Service Identifier Node]
        E --> F[Code Retrieval Node · RAG]
        F --> G[Analysis Node · ReAct]
        G --> H[Code Generator Node]
        H --> I[Test Writer Node]
        I --> J[Validator Node · build + lint]
        J --> K[INTERRUPT: Human Review]
        K --> L[PR Creator Node]
    end

    subgraph Tools
        M[JIRA MCP Server]
        N[GitHub MCP Server]
        O[Vector DB · pgvector]
        P[Build Sandbox · Docker]
    end

    D --> M
    F --> O
    H --> P
    L --> N

Data Flow Diagram

graph LR
    subgraph External
        A[JIRA API]
        B[GitHub API]
        C[LLM API · Anthropic]
    end

    subgraph Agent Platform
        D[Input Guardrail]
        E[LangGraph State Machine]
        F[Output Guardrail]
        G[PostgreSQL · state + checkpoints]
        H[pgvector · code embeddings]
    end

    A --> D --> E
    E --> C
    C --> E
    E --> H
    E --> G
    E --> F --> B

Capacity Planning

Component Key Metric Typical Values
LLM API Tokens per run 20K–100K tokens per JIRA ticket
Vector DB Documents indexed 100K–10M chunks for large codebases
RAG retrieval Tokens per retrieval call 500–2000 tokens per chunk
Build validation Docker execution time 1–5 minutes per service
Full run time End-to-end latency 3–15 minutes per JIRA ticket
Cost per run API costs \(0.05–\)2.00 per ticket (model-dependent)