Agent Foundry
All Problems

#2. Right Tool for the Job

EasyTool Calling

The Problem

You have an assistant with four tools: weather lookup, web search, a calculator, and a calendar check. In testing, the model often picks the wrong tool—weather questions trigger web_search, percentage math goes to calendar_check, and scheduling questions bounce between tools. The implementations are correct; the bug is that tool docstrings are vague and the system prompt does not route traffic. Under the problem constraints you may not remove tools or change implementations—you may only improve tool descriptions and the system prompt so each query type maps to the right tool.

Examples

Weather query

User input: What's the weather in San Francisco?

Wrong behavior: Agent calls web_search (or calendar_check) instead of weather_lookup.

Expected: Agent calls weather_lookup with a location string and answers from that tool's output.

Math query

User input: What is 15% of 230?

Wrong behavior: Agent calls calendar_check or web_search instead of evaluating the expression.

Expected: Agent calls calculator with an appropriate arithmetic expression (e.g. 230 * 0.15) and returns the numeric result.

Calendar query

User input: Do I have any meetings on Friday?

Wrong behavior: Agent uses web_search or weather_lookup.

Expected: Agent calls calendar_check with the relevant date and answers from the returned events.

Your Task

  1. Rewrite each tool's description so it states what the tool is for, what input it expects, and what output the model should expect—without overlapping another tool's job.
  2. Update the system prompt with explicit routing rules (e.g. weather → weather_lookup, general knowledge / broad search → web_search, arithmetic → calculator, schedule → calendar_check).
  3. Keep all four tools and their bodies unchanged.

Evaluation

Submissions are judged on:

  • Correct tool selection: The right tool is chosen for each kind of question.
  • No unnecessary tool calls: The agent does not chain extra tools when one suffices.
  • Clear tool descriptions: Each description is specific about purpose, inputs, and outputs.
  • All test queries pass: The provided weather, calculator, and calendar test prompts route to the correct tools.

Constraints

  • You may not remove any existing tools
  • You may not modify the tool implementations
  • You may only change the tool descriptions and the system prompt
Starter Code
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool

@tool
def weather_lookup(location: str) -> str:
    """Look up information."""  # BUG: Vague description
    return f"Weather in {location}: 72°F, sunny"

@tool
def web_search(query: str) -> str:
    """Search for things."""  # BUG: Vague description
    return f"Results for {query}: ..."

@tool
def calculator(expression: str) -> str:
    """Compute stuff."""  # BUG: Vague description
    return str(eval(expression))

@tool
def calendar_check(date: str) -> str:
    """Check things."""  # BUG: Vague description
    return f"Events on {date}: Team standup at 10am"

llm = ChatOpenAI(model="gpt-4o-mini")
tools = [weather_lookup, web_search, calculator, calendar_check]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

# Tests
print(executor.invoke({"input": "What's the weather in San Francisco?"}))
print(executor.invoke({"input": "What is 15% of 230?"}))
print(executor.invoke({"input": "Do I have any meetings on Friday?"}))
Open in Google Colab
Evaluation Criteria0/4