Agent Foundry
OpenAI Agents SDK

Input & Output Guardrails

IntermediateTopic 12 of 22Open in Colab

Input & Output Guardrails

Guardrails let you validate, filter, or block agent inputs and outputs before they're processed or returned. The OpenAI Agents SDK provides @input_guardrail and @output_guardrail decorators that run checks in parallel with (or before) the agent, tripping a wire if the content is unsafe or invalid.

Why Guardrails

Agents can receive malicious prompts and produce harmful outputs. Guardrails act as safety gates — checking inputs before the agent processes them and validating outputs before they reach the user.

Input → [Input Guardrail] → Agent processes → [Output Guardrail] → Response
         ↓ tripwire                              ↓ tripwire
         Block & raise exception                 Block & raise exception

Input Guardrails

An input guardrail runs when the agent receives input. It can inspect the user's message and decide whether to allow or block it:

from agents import Agent, Runner, input_guardrail, GuardrailFunctionOutput
 
@input_guardrail
async def block_profanity(ctx, agent, input):
    """Block messages containing inappropriate language."""
    bad_words = ["spam", "scam", "hack"]
    contains_bad = any(word in input.lower() for word in bad_words)
    return GuardrailFunctionOutput(
        output_info={"flagged": contains_bad},
        tripwire_triggered=contains_bad,
    )
 
agent = Agent(
    name="Safe Agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[block_profanity],
)
 
result = Runner.run_sync(agent, "Hello, how are you?")
print(result.final_output)

Output Guardrails

An output guardrail runs after the agent produces its response. It can validate the output and block it if needed:

from agents import Agent, Runner, output_guardrail, GuardrailFunctionOutput
 
@output_guardrail
async def block_pii(ctx, agent, output):
    """Block responses that might contain personal information."""
    pii_patterns = ["SSN", "social security", "credit card"]
    contains_pii = any(pattern in output.lower() for pattern in pii_patterns)
    return GuardrailFunctionOutput(
        output_info={"contains_pii": contains_pii},
        tripwire_triggered=contains_pii,
    )
 
agent = Agent(
    name="Careful Agent",
    instructions="You are a helpful assistant. Never share personal information.",
    output_guardrails=[block_pii],
)
 
result = Runner.run_sync(agent, "Tell me about data privacy.")
print(result.final_output)

GuardrailFunctionOutput

Every guardrail function returns a GuardrailFunctionOutput:

FieldTypeDescription
output_infodictMetadata about the check (for logging/debugging)
tripwire_triggeredboolTrue to block, False to allow

Handling Tripwire Exceptions

When a guardrail trips, the SDK raises an exception. Catch it to handle blocked requests gracefully:

from agents.exceptions import InputGuardrailTripwireTriggered, OutputGuardrailTripwireTriggered
 
try:
    result = Runner.run_sync(agent, "How do I hack a website?")
    print(result.final_output)
except InputGuardrailTripwireTriggered:
    print("Input blocked: Your message was flagged by our safety system.")
except OutputGuardrailTripwireTriggered:
    print("Output blocked: The response was flagged by our safety system.")

LLM-Based Guardrails

For more sophisticated checks, use a secondary agent as a guardrail:

from agents import Agent, Runner, input_guardrail, GuardrailFunctionOutput
 
guardrail_agent = Agent(
    name="Content Classifier",
    instructions=(
        "Classify the user's message as 'safe' or 'unsafe'. "
        "Respond with exactly one word: safe or unsafe."
    ),
)
 
@input_guardrail
async def llm_safety_check(ctx, agent, input):
    """Use an LLM to classify input safety."""
    result = await Runner.run(guardrail_agent, input)
    is_unsafe = result.final_output.strip().lower() == "unsafe"
    return GuardrailFunctionOutput(
        output_info={"classification": result.final_output},
        tripwire_triggered=is_unsafe,
    )
 
main_agent = Agent(
    name="Main Agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[llm_safety_check],
)

Multiple Guardrails

You can stack multiple guardrails. They run in parallel by default:

agent = Agent(
    name="Protected Agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[block_profanity, llm_safety_check],
    output_guardrails=[block_pii],
)

Guardrails on First and Last Agent

In multi-agent systems with handoffs, guardrails only run on specific agents:

  • Input guardrails run only on the first agent (the one that receives the user's input)
  • Output guardrails run only on the last agent (the one that produces the final response)
specialist = Agent(
    name="Specialist",
    instructions="You handle technical questions.",
)
 
triage = Agent(
    name="Triage",
    instructions="Route questions to the right specialist.",
    handoffs=[specialist],
    input_guardrails=[block_profanity],   # Runs — this is the first agent
    output_guardrails=[block_pii],        # Does NOT run if specialist produces the output
)

To protect the final output, place output guardrails on the agent that actually generates the response.

Key Takeaways

  • @input_guardrail validates user messages before the agent processes them
  • @output_guardrail validates agent responses before they reach the user
  • Return GuardrailFunctionOutput(tripwire_triggered=True) to block content
  • Catch InputGuardrailTripwireTriggered and OutputGuardrailTripwireTriggered exceptions
  • Use LLM-based guardrails for nuanced content classification
  • In multi-agent flows, input guardrails run on the first agent and output guardrails on the last agent