Technical

LangGraph in Production: Building Reliable Multi-Step AI Workflows

A recent delivery where we moved from prompt-chained scripts to LangGraph-based orchestration, adding durable state, human approvals, and deterministic routing for higher reliability in enterprise automation.

Automation Services Team

LangGraph in Production: Building Reliable Multi-Step AI Workflows

We recently replaced a brittle prompt-chain automation flow with a LangGraph workflow for an operations team that handled change requests, incident triage, and customer status updates.

The old setup "worked" in demos, but in production it had problems:

  • Step order drifted when prompts changed
  • Failures required restarting from the beginning
  • Human approval checkpoints were hard to enforce
  • Debugging was painful because state lived across logs and prompt text

LangGraph solved this by treating the workflow as a graph with explicit state and controlled transitions.

The Problem: Prompt Chains Break Under Real Load

Prompt chaining is fine for simple linear tasks. This project was not linear.

We needed logic like:

  1. Classify request type
  2. Gather context from ticketing and monitoring systems
  3. Decide whether risk score requires human approval
  4. Run different execution paths by request class
  5. Generate customer-safe summary and internal handoff notes

In the previous architecture, this lived in ad hoc orchestration code plus prompt conditionals. Small changes in one step caused unexpected behavior later in the chain.

New Architecture: Stateful Graph with Checkpoints

We modelled the flow as a graph:

ingest_request
  -> classify_intent
  -> gather_context
  -> risk_assessment
      -> [high risk] human_approval
      -> [low risk] execute_plan
  -> generate_updates
  -> persist_outcome

Why This Worked Better

  • Explicit state object passed through every node
  • Deterministic branching with typed routing conditions
  • Checkpointing between critical steps for recovery
  • Resumability after transient API failures
  • Human-in-the-loop nodes for approval gates

Instead of "hope the chain behaves", we got a controllable execution model.

State Design: Typed and Minimal

The biggest win came from disciplined state design. We used a compact state schema:

  • request_id
  • intent
  • context_bundle
  • risk_score
  • approval_status
  • execution_actions
  • customer_summary
  • internal_notes

Keeping state minimal prevented accidental coupling between nodes and made replay/debugging straightforward.

Routing Strategy: Model for Reasoning, Code for Policy

We deliberately separated concerns:

  • LLM nodes for interpretation and summarization
  • Deterministic code nodes for policy and safety checks

Example:

  • LLM proposes risk factors
  • Code node computes final risk score and policy outcome
  • Only code node can route to execute_plan without approval

This pattern reduced "clever but unsafe" model behavior.

Human Approval as a First-Class Node

In prior versions, approval was bolted on with external conditionals and easy to bypass accidentally.

In the graph version, human_approval is an explicit node that:

  • Pauses execution with persisted state
  • Publishes a review payload to ops dashboard
  • Resumes from the same point after decision
  • Captures reviewer identity and timestamp for audit

This gave compliance and operations teams confidence that controls were real, not implied.

Reliability and Observability

We instrumented every node with:

  • Start/end timestamps
  • Input and output schema validation
  • Retry counters
  • Error classification

Because each node had clear boundaries, we could answer failure questions quickly:

  • Did classification fail, or did a downstream API fail?
  • Was the failure transient or deterministic?
  • Can we safely resume from checkpoint N?

Production Outcomes

After migration to LangGraph orchestration:

  • Workflow success rate: 82% -> 96%
  • Manual rework rate: reduced by ~50%
  • Mean recovery time after failure: reduced by ~60%
  • Approval-policy violations: dropped to zero in audited period

The team also moved faster because changes became graph edits and node tests, not fragile prompt-chain rewiring.

Lessons Learned

Start with State, Not Nodes

If state design is weak, graph design becomes messy. Define the state contract first.

Treat Prompts as Components

We versioned prompts per node, tested them against fixtures, and tracked prompt revisions in release notes.

Keep Deterministic Logic Out of LLM Nodes

Any policy, permission, or safety-critical rule should be implemented as code, not prompt instructions.


Evaluating agent frameworks for production workloads? We help teams implement LangGraph-based orchestration with checkpointing, approvals, and observable execution that holds up under real operational load.