Day 14 — Week 2 Consolidation and Graph Design Review

Pre-reading: Week 2 Overview · Learning Plan

🎯 Purpose of Today

Week 2 covered six interconnected topics — agentic patterns, memory and planning, LangGraph, multi-agent coordination, human-in-the-loop, and reliability hardening. Today you pull everything together: revise every concept through tables and a unified diagram, work through a full end-to-end design drill, and track the misconceptions most likely to trip you up in interviews.

🗺️ Week 2 Concept Map

mindmap root((Week 2)) Agentic Patterns Chain vs Agent ReAct Loop Tool Calling Tool Schema Design Workflow vs Autonomous Planning and Memory 4 Memory Types State Management Plan-and-Execute Context Window Management ConversationSummaryMemory LangGraph StateGraph Nodes and Edges Conditional Edges TypedDict State Schema CheckpointSaver interrupt and resume Multi-Agent Orchestrator-Subagent Peer-to-Peer Hierarchical Handoff Protocol Failure Propagation HITL Action Classification Approval Gates Audit Trail interrupt() Command(resume) Reliability 7 Threats Retry and Backoff Loop Detection Input Validation Output Validation Graceful Degradation Observability

📐 Architecture Recap — Unified View

graph TD U["User Request"] --> IV["Input Validation · sanitise · classify"] IV --> AG["Agentic Pattern · ReAct · Plan-and-Execute"] AG --> MEM["Memory Layer · in-context · external · episodic · semantic"] MEM --> LG["LangGraph StateGraph · nodes · edges · conditional routing"] LG -->|"single agent"| TC["Tool Calling · validated schema · retry · backoff"] LG -->|"multi-agent"| MA["Multi-Agent · Orchestrator + Subagents · handoff"] MA --> TC TC --> HITL{"Irreversible Action?"} HITL -->|"yes"| GATE["Human Approval Gate · interrupt()"] HITL -->|"no"| EXEC["Auto-Execute"] GATE -->|"approved"| EXEC GATE -->|"rejected"| ABORT["Abort · safe response"] EXEC --> OV["Output Validation · Pydantic schema"] OV --> LOG["Audit Log · append-only"] LOG --> RESP["Response to User"] OV -->|"failure"| GD["Graceful Degradation · partial result · circuit breaker"] GD --> RESP style U fill:#1976d2,color:#fff style IV fill:#ff9800,color:#fff style AG fill:#1976d2,color:#fff style MEM fill:#1976d2,color:#fff style LG fill:#1976d2,color:#fff style TC fill:#ff9800,color:#fff style MA fill:#ff9800,color:#fff style HITL fill:#c62828,color:#fff style GATE fill:#c62828,color:#fff style EXEC fill:#2e7d32,color:#fff style ABORT fill:#c62828,color:#fff style OV fill:#ff9800,color:#fff style LOG fill:#1976d2,color:#fff style RESP fill:#2e7d32,color:#fff style GD fill:#c62828,color:#fff

⚡ Core Concepts Quick Revision — Master Table

Concept	Definition	When It Matters	Common Pitfall
Agent	LLM that decides its next action at each step	Any task where steps depend on runtime discovery	Defaulting to agent when a chain would suffice
Chain	Fixed sequence of LLM calls defined at design time	Structured, predictable tasks	Making it too rigid to handle edge cases
ReAct	Thought → Action → Observation loop until Final Answer	Standard agent reasoning pattern	Not setting a max-step limit
Tool calling	Model emits structured JSON request; host executes it	Every agentic application	Forgetting that the model never executes code directly
Tool schema	Name + description + parameter types for a tool	Determines how often the model calls the tool correctly	Vague descriptions causing wrong tool selection
Workflow agent	Agent with mostly predetermined steps	Lower-risk, auditable pipelines	Over-engineering with full autonomy when workflow suffices
In-context memory	Active context window content	Current conversation and immediate tool results	Token overflow from accumulating too much history
External memory	Database or store persisting across sessions	Long-term personalisation and state	Forgetting to handle read/write failures gracefully
Episodic memory	Summaries of past agent runs	Learning from previous task outcomes	Storing too much; retrieving too little
Semantic memory	Vector store queried on demand	Large knowledge bases too big for context	Stale embeddings returning outdated information
State drift	Agent state becoming inconsistent over a long run	Long multi-step tasks	Using implicit state (prompt string) instead of explicit TypedDict
Plan-and-Execute	Separate planner + executor agents	Multi-step tasks with independent sub-steps	Using a weak model for planning (planner should be strongest)
StateGraph	LangGraph class holding nodes, edges, and state schema	Every LangGraph application	Forgetting to call `compile()` before `invoke()`
Conditional edge	Edge whose target is chosen by a routing function	Agent loops, branching workflows	Missing the default/fallback key in the routing map
CheckpointSaver	Saves state after every node for resume/fault tolerance	Long-running and interruptible workflows	Using `MemorySaver` in production (state lost on restart)
Orchestrator/Subagent	Central agent delegates to specialist subagents	Most production multi-agent systems	Orchestrator passing its full context to every subagent (expensive)
Handoff protocol	Structured task + context transfer between agents	Every multi-agent transition	Passing the entire conversation history instead of trimmed context
HITL	Human approval gate before irreversible actions	Delete, deploy, send-mass actions, compliance	Putting HITL on every action (kills usability)
interrupt()	LangGraph function that pauses graph execution	HITL implementation in LangGraph	Forgetting to configure a CheckpointSaver before using interrupt
Prompt injection	Adversarial instructions embedded in retrieved content	Any agent that processes external content	Assuming retrieved content is safe to trust unconditionally
Exponential backoff	Retry with increasing wait + random jitter	Any external API call that may fail transiently	Retrying auth failures (waste) or not adding jitter (thundering herd)
Graceful degradation	Return partial results or safe fallback on failure	All production agent systems	Letting one tool failure crash the entire run
Loop detection	Detecting repeated tool calls with same args	ReAct agents on ambiguous tasks	Relying only on max-steps (misses alternating 2-tool loops)
Audit trail	Immutable append-only log of all agent actions and decisions	Any system with HITL or regulatory requirements	Letting the agent modify its own audit records

🔬 End-to-End Drill — Design an Automated Code Review Agent

Scenario: Your team wants to build an agent that automatically reviews GitHub pull requests. The agent should: (1) fetch the PR diff, (2) run static analysis tools, (3) consult internal coding standards docs, (4) write review comments, (5) optionally request changes or approve. The system must be reliable, auditable, and have a human gate before posting any public comments.

Model Walkthrough

Step 1 — Choose the agentic pattern

This is a multi-step task where steps are known upfront (fetch → analyse → consult docs → draft → gate → post). A plan-and-execute pattern fits well. The planner generates the step list; executor agents handle each step. Alternatively, a single ReAct agent with 5 tools works for simpler cases — but a planner adds auditability.

Step 2 — Design the tool set

Tool	Description	Reversibility
`fetch_pr_diff(pr_id)`	Fetches the diff text for a PR	Reversible (read-only)
`run_linter(code: str)`	Runs ESLint/Pylint on a code snippet	Reversible (read-only)
`search_coding_standards(query)`	Semantic search over internal docs	Reversible (read-only)
`draft_review_comment(diff, issues)`	LLM call to draft a review	Reversible (draft, not posted)
`post_review_comment(pr_id, comment, decision)`	Posts comment to GitHub; requests changes or approves	Irreversible

Step 3 — Design the LangGraph StateGraph

START → fetch_diff_node → linter_node → docs_retrieval_node → draft_node → hitl_gate_node → post_node → END

State schema:

class PRReviewState(TypedDict):
    pr_id: str
    diff: str
    lint_issues: list[str]
    relevant_standards: list[str]
    draft_comment: str
    human_decision: str | None  # "approve_post", "reject", "edit"
    edited_comment: str | None
    result: str
    audit_log: list
    step_count: int

Step 4 — Memory architecture

In-context: the diff, lint results, retrieved standards, and draft (total ~8k tokens max — should fit).
Semantic memory: vector store of coding standards docs, queried in docs_retrieval_node.
External memory: store past reviews per author/repo so the agent learns recurring issues.
Episodic: summary of the last 5 reviews — lets the agent note "this author frequently misuses async/await".

Step 5 — HITL gate

post_review_comment is irreversible (public comment on GitHub). Gate it with interrupt():

The hitl_gate_node calls interrupt(), exposing the draft comment and a summary of issues.
The reviewer approves, rejects, or edits the draft via a Slack bot or web UI.
On approval, the graph resumes and calls post_node.

Step 6 — Reliability hardening

fetch_pr_diff: retry 3× with backoff; if GitHub API is down, return a partial result.
run_linter: 5-second timeout; on timeout, skip linting and note in the draft "linting unavailable".
search_coding_standards: if the vector store returns 0 results, continue with general principles — do not fail.
Max steps: 20 (generous for this pipeline).
Audit log: every node appends an entry; post_node writes the final decision and GitHub comment URL.

Step 7 — Observability

Wrap in LangSmith tracing. Track: total run time per PR, lint tool latency, retrieval hit rate, HITL approval rate (what % of drafts are approved as-is vs edited). A high edit rate signals the draft node needs a better prompt.

💬 Interview Q&A

??? question "Describe end-to-end how you would design a reliable multi-step agent for a high-stakes task." Start with tool design: enumerate every action, classify by reversibility, and design typed schemas for each. Next choose a coordination pattern — orchestrator/subagent if steps require different expertise, plan-and-execute if steps are known upfront. Model the workflow as a LangGraph StateGraph with an explicit TypedDict state schema. Add a HITL gate (interrupt()) before any irreversible tool, backed by a CheckpointSaver so the graph can resume. Wrap every tool call in retry + backoff + timeout. Validate all tool inputs and outputs with Pydantic. Cap the agent at a max step limit and implement loop detection. Instrument with LangSmith for tracing. Design an append-only audit trail covering every proposed and executed action.

??? question "What is the difference between LCEL, LangGraph, and raw Python for agent orchestration?" LCEL (LangChain Expression Language) uses | pipes to compose linear, acyclic chains — best for simple prompt → LLM → output pipelines with no branching or state. LangGraph adds a typed state object, cycles (loops), conditional branching, checkpointing, and interrupt()/resume — it is the right choice for agent loops, multi-agent routing, and HITL. Raw Python offers full control with no framework overhead — best when you need custom parallelism, non-standard state management, or want to avoid the LangChain dependency entirely. In practice, most production agentic systems use LangGraph for orchestration and LCEL inside individual nodes.

How do token costs compound in multi-agent systems and how do you control them?

Each agent call consumes tokens proportional to its context. In a 3-subagent system, if the orchestrator passes its full state (1k tokens) to each subagent, that is 3k tokens just for handoffs per orchestration round — before any actual work. With 10 orchestration rounds, that is 30k tokens in overhead alone. Controls: (1) context trimming — pass only the fields each subagent needs; ( 2) model tiering — use GPT-4o for orchestration, GPT-4o-mini for simpler subagents; (3) * caching — cache retrieval and search results for identical queries; (4) parallelism* — run independent subagents concurrently with asyncio.gather to reduce wall-clock time without increasing token cost.

When is a plan-and-execute agent better than a ReAct agent?

Plan-and-execute is better when: (1) the task decomposes into ordered, mostly independent steps that can be enumerated upfront; (2) you want the user to review and approve the plan before execution starts; (3) you can use a cheaper model for execution steps (the planner is the only expensive call); or (4) you need a structured, auditable step-by-step log. ReAct is better when each step's content depends on the result of the previous one, so a full plan cannot be written upfront — for example, open-ended research where the next tool to call depends on what the last search returned.

What are the mandatory elements of a production-grade agent audit trail?

Five mandatory elements: (1) append-only storage — INSERT only, no UPDATE or DELETE on audit records; (2) action timestamp at proposal — when the agent decided, not just when it executed; ( 3) full action parameters — tool name, all arguments, not just a summary; (4) human identity on every HITL decision — who approved/rejected, not just the outcome; (5) outcome and error details — what happened after the decision, including any rollback or failure. Optional but strongly recommended: reversibility class per action, and a causal chain linking each action to the original user request.

🐛 Weak Spots Tracker

Common Misconception	Incorrect Mental Model	Correct Mental Model
"Agents are just long chains"	The extra steps are the only difference	The difference is who decides the next step — developer for chains, LLM for agents
"Tool calling means the model runs code"	The LLM executes the function itself	The model emits a structured request; the host executes it
"ReAct always works if you give enough steps"	More iterations = better results	Without a max-step limit, more iterations = higher cost + infinite loop risk
"In-context memory is fine for long sessions"	The context window is large enough	Context windows are finite; without summarisation, old context gets truncated
"Plan-and-execute is always better than ReAct"	Planning upfront is always more efficient	ReAct is better for exploratory tasks where each step depends on the previous result
"HITL is only needed for dangerous tools"	Safety is the only reason	HITL is also needed for compliance, trust-building, and early-deployment validation
"Shared state in LangGraph is the same as message passing"	They are both ways to share data	Shared state is synchronous and coupled; message passing is async and decoupled
"LangGraph interrupt() works without a CheckpointSaver"	The graph just pauses automatically	Without a CheckpointSaver, interrupt raises an error — there is nowhere to save state
"Prompt injection only affects untrusted user inputs"	Only user messages can inject	Retrieved documents, web pages, and tool results can all contain injections
"Graceful degradation means catching all exceptions"	A try/except is enough	Graceful degradation requires returning useful partial results + circuit breakers, not just suppressing errors

✅ End-of-Week Checklist

Item	Status
Can draw the unified Week 2 architecture diagram from memory	☐
Can define all 24 terms in the Master Table without notes	☐
Completed the code review agent design drill	☐
Reviewed all 5 interview Q&A blocks and can answer each in ≤90 seconds	☐
Identified top 3 weak spots from the tracker	☐
Scheduled 30-min revision session for identified weak spots	☐
All 6 Day 8–13 end-of-day checklists are fully ticked	☐