LangGraph in Production: Building Reliable Multi-Step AI Workflows
A recent delivery where we moved from prompt-chained scripts to LangGraph-based orchestration, adding durable state, human approvals, and deterministic routing for higher reliability in enterprise automation.
LangGraph in Production: Building Reliable Multi-Step AI Workflows
We recently replaced a brittle prompt-chain automation flow with a LangGraph workflow for an operations team that handled change requests, incident triage, and customer status updates.
The old setup "worked" in demos, but in production it had problems:
- Step order drifted when prompts changed
- Failures required restarting from the beginning
- Human approval checkpoints were hard to enforce
- Debugging was painful because state lived across logs and prompt text
LangGraph solved this by treating the workflow as a graph with explicit state and controlled transitions.
The Problem: Prompt Chains Break Under Real Load
Prompt chaining is fine for simple linear tasks. This project was not linear.
We needed logic like:
- Classify request type
- Gather context from ticketing and monitoring systems
- Decide whether risk score requires human approval
- Run different execution paths by request class
- Generate customer-safe summary and internal handoff notes
In the previous architecture, this lived in ad hoc orchestration code plus prompt conditionals. Small changes in one step caused unexpected behavior later in the chain.
New Architecture: Stateful Graph with Checkpoints
We modelled the flow as a graph:
ingest_request
-> classify_intent
-> gather_context
-> risk_assessment
-> [high risk] human_approval
-> [low risk] execute_plan
-> generate_updates
-> persist_outcome
Why This Worked Better
- Explicit state object passed through every node
- Deterministic branching with typed routing conditions
- Checkpointing between critical steps for recovery
- Resumability after transient API failures
- Human-in-the-loop nodes for approval gates
Instead of "hope the chain behaves", we got a controllable execution model.
State Design: Typed and Minimal
The biggest win came from disciplined state design. We used a compact state schema:
request_idintentcontext_bundlerisk_scoreapproval_statusexecution_actionscustomer_summaryinternal_notes
Keeping state minimal prevented accidental coupling between nodes and made replay/debugging straightforward.
Routing Strategy: Model for Reasoning, Code for Policy
We deliberately separated concerns:
- LLM nodes for interpretation and summarization
- Deterministic code nodes for policy and safety checks
Example:
- LLM proposes risk factors
- Code node computes final risk score and policy outcome
- Only code node can route to
execute_planwithout approval
This pattern reduced "clever but unsafe" model behavior.
Human Approval as a First-Class Node
In prior versions, approval was bolted on with external conditionals and easy to bypass accidentally.
In the graph version, human_approval is an explicit node that:
- Pauses execution with persisted state
- Publishes a review payload to ops dashboard
- Resumes from the same point after decision
- Captures reviewer identity and timestamp for audit
This gave compliance and operations teams confidence that controls were real, not implied.
Reliability and Observability
We instrumented every node with:
- Start/end timestamps
- Input and output schema validation
- Retry counters
- Error classification
Because each node had clear boundaries, we could answer failure questions quickly:
- Did classification fail, or did a downstream API fail?
- Was the failure transient or deterministic?
- Can we safely resume from checkpoint N?
Production Outcomes
After migration to LangGraph orchestration:
- Workflow success rate: 82% -> 96%
- Manual rework rate: reduced by ~50%
- Mean recovery time after failure: reduced by ~60%
- Approval-policy violations: dropped to zero in audited period
The team also moved faster because changes became graph edits and node tests, not fragile prompt-chain rewiring.
Lessons Learned
Start with State, Not Nodes
If state design is weak, graph design becomes messy. Define the state contract first.
Treat Prompts as Components
We versioned prompts per node, tested them against fixtures, and tracked prompt revisions in release notes.
Keep Deterministic Logic Out of LLM Nodes
Any policy, permission, or safety-critical rule should be implemented as code, not prompt instructions.
Evaluating agent frameworks for production workloads? We help teams implement LangGraph-based orchestration with checkpointing, approvals, and observable execution that holds up under real operational load.