Agent Foundry
All Problems

#68. Sensitive Action Gatekeeper

MediumGuardrailsTool Calling

The Problem

Your file management agent has tools to read, delete, and send files. Currently, all tools execute immediately when the agent decides to call them — including destructive ones like delete_file and send_email. There's no confirmation step for sensitive actions, so a misunderstood request or a hallucinated tool call can permanently delete files or send unintended emails. Your job is to add a gatekeeper layer that requires user confirmation before any sensitive action executes, while letting safe actions (like reading files) proceed immediately.

Examples

Example 1

User input: Delete the file report.pdf

Current (bad) output: Deleted report.pdf successfully — the file is gone, no confirmation asked.

Expected (good) output: "I'm about to delete report.pdf. Are you sure? (yes/no)" → User confirms → Deleted report.pdf successfully. If user declines → "Action cancelled. report.pdf was not deleted."

Example 2

User input: Send an email to bob@example.com with the Q4 report

Current (bad) output: Email sent to bob@example.com: Q4 Report — sent immediately without confirmation.

Expected (good) output: "I'm about to send an email to bob@example.com with subject 'Q4 Report'. Proceed? (yes/no)" → Confirmation required before sending.

Example 3

User input: Read the file notes.txt

Current (bad) output: (This is fine — reading is a safe action.)

Expected (good) output: Contents of notes.txt: [file data] — no confirmation needed for read operations.

Your Task

Add a sensitive action gatekeeper so the agent:

  • Requires user confirmation before executing destructive tools (delete, send).
  • Lets non-destructive tools (read, search) execute immediately.
  • Shows a clear description of the pending action in the confirmation prompt.
  • Cancels the action if the user declines.

Evaluation

Submissions are checked for the following:

  • Sensitive actions require confirmation: Delete and send tools prompt before executing.
  • Safe tools execute immediately: Read and search tools run without confirmation.
  • Confirmation describes the action: The prompt clearly states what will happen.
  • Declined actions are not executed: Saying "no" prevents the action from running.

Constraints

  • Sensitive tools (delete, send) must require user confirmation before execution
  • Non-sensitive tools (read, search) should execute immediately without confirmation
  • The confirmation step must clearly describe what action will be taken
  • If the user declines, the action must not execute
Starter Code
from agents import Agent, Runner
from agents.tool import function_tool

@function_tool
def read_file(filename: str) -> str:
    """Read a file from the filesystem."""
    return f"Contents of {filename}: [file data]"

@function_tool
def delete_file(filename: str) -> str:
    """Delete a file from the filesystem."""
    # BUG: This destructive action executes immediately with no confirmation
    return f"Deleted {filename} successfully"

@function_tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a recipient."""
    # BUG: This sensitive action executes immediately with no confirmation
    return f"Email sent to {to}: {subject}"

agent = Agent(
    name="File Manager",
    instructions="You are a file management assistant. Help users read, delete, and send files.",
    tools=[read_file, delete_file, send_email],
)

# Test: The agent deletes a file without asking for confirmation
result = Runner.run_sync(agent, "Delete the file report.pdf")
print(result.final_output)
Open in Google Colab
Evaluation Criteria0/4