Custom Middleware

AdvancedTopic 19 of 22Open in Colab

Custom Middleware

Middleware in LangChain agents intercepts the processing pipeline at defined hook points, letting you modify inputs, outputs, and behavior without changing agent or model code. You can log requests, enforce policies, short-circuit execution, and stack multiple middleware layers for defense in depth.

AgentMiddleware Class

All custom middleware extends AgentMiddleware. Override its hook methods to inject logic at specific points in the agent lifecycle:

from langgraph.prebuilt.middleware import AgentMiddleware
 
class LoggingMiddleware(AgentMiddleware):
    def before_agent(self, state):
        print(f"[INPUT] {state['messages'][-1].content}")
        return state
 
    def after_agent(self, state):
        print(f"[OUTPUT] {state['messages'][-1].content}")
        return state

Hook Points

AgentMiddleware provides four hook points that fire at different stages of processing:

Hook	When It Fires	Common Uses
`before_agent`	Before the agent processes the message	Input validation, logging, content filtering
`after_agent`	After the agent produces a response	Output filtering, response formatting, auditing
`before_model`	Before the LLM call	Prompt injection, token counting, rate limiting
`after_model`	After the LLM returns	Response caching, cost tracking, quality checks

class FullLifecycleMiddleware(AgentMiddleware):
    def before_agent(self, state):
        print("1. Before agent processes input")
        return state
 
    def after_agent(self, state):
        print("4. After agent produces response")
        return state
 
    def before_model(self, state):
        print("2. Before LLM call")
        return state
 
    def after_model(self, state):
        print("3. After LLM returns")
        return state

@hook_config Decorator

Use @hook_config to configure hook behavior — for example, specifying which message types a hook should process:

from langgraph.prebuilt.middleware import AgentMiddleware, hook_config
 
class SelectiveMiddleware(AgentMiddleware):
    @hook_config(message_types=["human"])
    def before_agent(self, state):
        print("Only runs for human messages")
        return state
 
    @hook_config(message_types=["ai"])
    def after_agent(self, state):
        print("Only runs for AI responses")
        return state

can_jump_to for Short-Circuiting

The can_jump_to method lets middleware short-circuit the agent pipeline. If a condition is met, the middleware can skip agent processing entirely and return a response directly:

class CacheMiddleware(AgentMiddleware):
    def __init__(self):
        self.cache = {}
 
    def before_agent(self, state):
        user_msg = state["messages"][-1].content
        if user_msg in self.cache:
            return self.can_jump_to(
                state,
                response=self.cache[user_msg],
            )
        return state
 
    def after_agent(self, state):
        user_msg = state["messages"][-2].content
        response = state["messages"][-1].content
        self.cache[user_msg] = response
        return state

Logging Middleware Example

A complete logging middleware that records request/response pairs with timestamps:

import time
from langgraph.prebuilt.middleware import AgentMiddleware
 
class TimedLoggingMiddleware(AgentMiddleware):
    def __init__(self):
        self.logs = []
 
    def before_agent(self, state):
        state["_start_time"] = time.time()
        self.logs.append({
            "type": "input",
            "content": state["messages"][-1].content,
            "timestamp": time.time(),
        })
        return state
 
    def after_agent(self, state):
        elapsed = time.time() - state.get("_start_time", time.time())
        self.logs.append({
            "type": "output",
            "content": state["messages"][-1].content,
            "timestamp": time.time(),
            "elapsed_seconds": round(elapsed, 2),
        })
        return state

Stacking Multiple Middleware

Pass a list of middleware to create_react_agent. They execute in order — each one processes the output of the previous:

from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent
 
model = init_chat_model("gpt-4o-mini", model_provider="openai")
 
logger = TimedLoggingMiddleware()
 
class UppercaseMiddleware(AgentMiddleware):
    def after_agent(self, state):
        state["messages"][-1].content = state["messages"][-1].content.upper()
        return state
 
agent = create_react_agent(
    model=model,
    tools=[],
    prompt="You are a helpful assistant.",
    middleware=[logger, UppercaseMiddleware()],
)

The logging middleware records the request, the agent processes it, then the uppercase middleware transforms the output, and finally the logger records the transformed response.

Using Middleware with the Agent

from langchain_core.messages import HumanMessage
 
result = agent.invoke({
    "messages": [HumanMessage(content="What is the capital of France?")]
})
print(result["messages"][-1].content)
 
print("\nLogs:")
for log in logger.logs:
    print(f"  [{log['type']}] {log['content'][:50]}...")

Key Takeaways

AgentMiddleware provides four hook points: before_agent, after_agent, before_model, after_model
@hook_config configures which message types a hook processes
can_jump_to short-circuits the pipeline to return cached or pre-computed responses
Middleware stacks execute in order — each processes the previous one's output
Use before_agent for input validation and filtering; after_agent for output transformation
before_model/after_model hooks target the LLM call specifically for prompt injection or cost tracking

MCP Integration

Production Deployment & Observability