Input & Output Guardrails
Input & Output Guardrails
Guardrails let you validate, filter, or block agent inputs and outputs before they're processed or returned. The OpenAI Agents SDK provides @input_guardrail and @output_guardrail decorators that run checks in parallel with (or before) the agent, tripping a wire if the content is unsafe or invalid.
Why Guardrails
Agents can receive malicious prompts and produce harmful outputs. Guardrails act as safety gates — checking inputs before the agent processes them and validating outputs before they reach the user.
Input → [Input Guardrail] → Agent processes → [Output Guardrail] → Response
↓ tripwire ↓ tripwire
Block & raise exception Block & raise exception
Input Guardrails
An input guardrail runs when the agent receives input. It can inspect the user's message and decide whether to allow or block it:
from agents import Agent, Runner, input_guardrail, GuardrailFunctionOutput
@input_guardrail
async def block_profanity(ctx, agent, input):
"""Block messages containing inappropriate language."""
bad_words = ["spam", "scam", "hack"]
contains_bad = any(word in input.lower() for word in bad_words)
return GuardrailFunctionOutput(
output_info={"flagged": contains_bad},
tripwire_triggered=contains_bad,
)
agent = Agent(
name="Safe Agent",
instructions="You are a helpful assistant.",
input_guardrails=[block_profanity],
)
result = Runner.run_sync(agent, "Hello, how are you?")
print(result.final_output)Output Guardrails
An output guardrail runs after the agent produces its response. It can validate the output and block it if needed:
from agents import Agent, Runner, output_guardrail, GuardrailFunctionOutput
@output_guardrail
async def block_pii(ctx, agent, output):
"""Block responses that might contain personal information."""
pii_patterns = ["SSN", "social security", "credit card"]
contains_pii = any(pattern in output.lower() for pattern in pii_patterns)
return GuardrailFunctionOutput(
output_info={"contains_pii": contains_pii},
tripwire_triggered=contains_pii,
)
agent = Agent(
name="Careful Agent",
instructions="You are a helpful assistant. Never share personal information.",
output_guardrails=[block_pii],
)
result = Runner.run_sync(agent, "Tell me about data privacy.")
print(result.final_output)GuardrailFunctionOutput
Every guardrail function returns a GuardrailFunctionOutput:
| Field | Type | Description |
|---|---|---|
output_info | dict | Metadata about the check (for logging/debugging) |
tripwire_triggered | bool | True to block, False to allow |
Handling Tripwire Exceptions
When a guardrail trips, the SDK raises an exception. Catch it to handle blocked requests gracefully:
from agents.exceptions import InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered
try:
result = Runner.run_sync(agent, "How do I hack a website?")
print(result.final_output)
except InputGuardrailTripwireTriggered:
print("Input blocked: Your message was flagged by our safety system.")
except OutputGuardrailTripwireTriggered:
print("Output blocked: The response was flagged by our safety system.")LLM-Based Guardrails
For more sophisticated checks, use a secondary agent as a guardrail:
from agents import Agent, Runner, input_guardrail, GuardrailFunctionOutput
guardrail_agent = Agent(
name="Content Classifier",
instructions=(
"Classify the user's message as 'safe' or 'unsafe'. "
"Respond with exactly one word: safe or unsafe."
),
)
@input_guardrail
async def llm_safety_check(ctx, agent, input):
"""Use an LLM to classify input safety."""
result = await Runner.run(guardrail_agent, input)
is_unsafe = result.final_output.strip().lower() == "unsafe"
return GuardrailFunctionOutput(
output_info={"classification": result.final_output},
tripwire_triggered=is_unsafe,
)
main_agent = Agent(
name="Main Agent",
instructions="You are a helpful assistant.",
input_guardrails=[llm_safety_check],
)Multiple Guardrails
You can stack multiple guardrails. They run in parallel by default:
agent = Agent(
name="Protected Agent",
instructions="You are a helpful assistant.",
input_guardrails=[block_profanity, llm_safety_check],
output_guardrails=[block_pii],
)Guardrails on First and Last Agent
In multi-agent systems with handoffs, guardrails only run on specific agents:
- Input guardrails run only on the first agent (the one that receives the user's input)
- Output guardrails run only on the last agent (the one that produces the final response)
specialist = Agent(
name="Specialist",
instructions="You handle technical questions.",
)
triage = Agent(
name="Triage",
instructions="Route questions to the right specialist.",
handoffs=[specialist],
input_guardrails=[block_profanity], # Runs — this is the first agent
output_guardrails=[block_pii], # Does NOT run if specialist produces the output
)To protect the final output, place output guardrails on the agent that actually generates the response.
Key Takeaways
@input_guardrailvalidates user messages before the agent processes them@output_guardrailvalidates agent responses before they reach the user- Return
GuardrailFunctionOutput(tripwire_triggered=True)to block content - Catch
InputGuardrailTripwireTriggeredandOutputGuardrailTripwireTriggeredexceptions - Use LLM-based guardrails for nuanced content classification
- In multi-agent flows, input guardrails run on the first agent and output guardrails on the last agent