Agent Foundry
LangChain

Production Deployment & Observability

AdvancedTopic 20 of 22Open in Colab

Production Deployment & Observability

Moving agents from prototype to production requires tracing, durable execution, monitoring, and cost controls. LangChain's ecosystem provides LangSmith for observability, Agent Server for durable execution, and OpenTelemetry integration for fitting into existing infrastructure.

LangSmith Tracing

LangSmith captures every LLM call, tool invocation, and agent step as a trace. Enable it with two environment variables:

import os
 
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = "your-langsmith-api-key"

Once enabled, every agent invocation is automatically traced — no code changes required:

from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
from langchain_core.messages import HumanMessage
 
model = init_chat_model("gpt-4o-mini", model_provider="openai")
 
agent = create_react_agent(
    model=model,
    tools=[],
    prompt="You are a helpful assistant.",
)
 
result = agent.invoke({
    "messages": [HumanMessage(content="What is LangSmith?")]
})
print(result["messages"][-1].content)

View traces in the LangSmith dashboard to see the full execution flow, token counts, latency, and errors.

Trace Metadata

Add metadata to traces for filtering and grouping in the dashboard:

result = agent.invoke(
    {"messages": [HumanMessage(content="Explain RAG")]},
    config={
        "metadata": {
            "user_id": "user-123",
            "session_id": "session-abc",
            "environment": "production",
        },
        "run_name": "rag-explanation",
    },
)

Agent Server for Durable Execution

LangGraph Agent Server provides durable, stateful agent execution with built-in persistence, fault tolerance, and horizontal scaling:

from langgraph.server import create_server
 
server = create_server(
    agent=agent,
    host="0.0.0.0",
    port=8000,
)

Agent Server features:

FeatureDescription
Durable executionAgent state survives process restarts
CheckpointingAutomatic state snapshots at each step
Horizontal scalingRun multiple agent instances behind a load balancer
StreamingStream agent responses over HTTP
Thread managementMaintain conversation threads with unique IDs

Deployment Options

OptionBest ForScaling
Agent Server (self-hosted)Full control, custom infrastructureManual / Kubernetes
LangGraph CloudManaged hosting, zero-opsAutomatic
Docker containersContainerized deploymentsKubernetes / ECS
Serverless functionsLow-traffic, event-driven agentsAuto-scaling

Monitoring Patterns

Track key metrics for production agents:

import time
 
class MonitoringMiddleware:
    def __init__(self):
        self.metrics = {
            "total_requests": 0,
            "total_errors": 0,
            "total_latency_ms": 0,
            "total_tokens": 0,
        }
 
    def before_agent(self, state):
        state["_request_start"] = time.time()
        self.metrics["total_requests"] += 1
        return state
 
    def after_agent(self, state):
        elapsed_ms = (time.time() - state.get("_request_start", time.time())) * 1000
        self.metrics["total_latency_ms"] += elapsed_ms
        return state

Key metrics to track:

MetricWhy It Matters
Latency (p50, p95, p99)User experience and SLA compliance
Token usageCost management and budget alerts
Error rateReliability and degradation detection
Tool call frequencyUnderstanding agent behavior patterns
Trace success rateEnd-to-end completion tracking

OpenTelemetry Integration

Export LangChain traces to any OpenTelemetry-compatible backend (Datadog, Honeycomb, Jaeger):

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
 
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
provider.add_span_processor(BatchSpanExporter(exporter))
trace.set_tracer_provider(provider)

LangChain automatically emits OpenTelemetry spans when a tracer provider is configured, integrating with your existing observability stack.

Cost Tracking

Monitor LLM costs by tracking token usage per request:

class CostTrackingMiddleware:
    COST_PER_1K_INPUT = 0.00015
    COST_PER_1K_OUTPUT = 0.0006
 
    def __init__(self):
        self.total_cost = 0.0
        self.request_costs = []
 
    def after_model(self, state):
        usage = state.get("_token_usage", {})
        input_cost = (usage.get("input_tokens", 0) / 1000) * self.COST_PER_1K_INPUT
        output_cost = (usage.get("output_tokens", 0) / 1000) * self.COST_PER_1K_OUTPUT
        request_cost = input_cost + output_cost
        self.total_cost += request_cost
        self.request_costs.append(request_cost)
        return state

A2A Protocol Overview

The Agent-to-Agent (A2A) protocol enables agents built with different frameworks to communicate over a standard HTTP interface:

ConceptDescription
Agent CardJSON metadata describing an agent's capabilities and endpoint
TaskA unit of work sent from one agent to another
MessageCommunication within a task (text, files, structured data)
ArtifactOutput produced by an agent (reports, code, data)

A2A lets you compose systems where a LangChain agent delegates to an AutoGen agent or a CrewAI agent, all communicating over HTTP without framework lock-in.

agent_card = {
    "name": "research-agent",
    "description": "Researches topics and produces summaries",
    "url": "https://my-agent.example.com",
    "capabilities": ["research", "summarization"],
    "protocol": "a2a/v1",
}

Key Takeaways

  • Enable LangSmith tracing with LANGSMITH_TRACING=true — zero code changes required
  • Add metadata to traces for filtering by user, session, and environment
  • Agent Server provides durable execution with checkpointing and horizontal scaling
  • Track latency, token usage, error rates, and tool call frequency in production
  • OpenTelemetry integration connects LangChain traces to Datadog, Honeycomb, Jaeger, and other backends
  • Cost tracking middleware monitors per-request and cumulative LLM spending
  • A2A protocol enables cross-framework agent communication over standard HTTP