Agent Foundry
All Problems

#24. Tool Output Parser

MediumTool Calling

The Problem

Your shopping assistant agent has a scrape_product tool that fetches product pages. The problem is that it returns raw HTML to the agent. The LLM wastes tokens processing HTML tags, sometimes picks up ad copy or sidebar content as product info, and occasionally misreads prices or ratings. Your job is to parse the HTML inside the tool and return clean, structured data so the agent gets exactly the fields it needs.

Examples

Example 1

User input: Get me the details for the headphones at https://example.com/headphones

Current (bad) output: The tool returns a full HTML blob including <div class="sidebar">, ad copy, and related product links. The LLM tries to parse it inline and sometimes reports the original price ($349.99) instead of the current price ($249.99).

Expected (good) output: The tool returns structured data like: Name: Wireless Noise-Cancelling Headphones | Price: $249.99 (was $349.99) | Rating: 4.5/5 (2,847 reviews) | In Stock | Description: Premium wireless headphones with active noise cancellation... The agent presents this accurately.

Example 2

User input: Is this product well-reviewed?

Current (bad) output: The LLM finds a star rating buried in the HTML but misinterprets the review count because it picked up a number from the ad sidebar.

Expected (good) output: The tool's structured output clearly separates rating (4.5) and review count (2,847), so the agent correctly says: Yes, it has 4.5 out of 5 stars from 2,847 reviews.

Your Task

  • Modify the scrape_product tool to parse the raw HTML and extract structured fields.
  • Return at minimum: product name, current price, rating, review count, availability, and description.
  • Strip out irrelevant content (ads, sidebars, related products).
  • The parsing logic must live inside the tool function, not in the prompt.

Evaluation

Submissions are checked for the following:

  • No raw HTML in tool output: The tool returns clean structured data, not HTML tags.
  • Key fields are extracted: The parsed output includes at least product name, price, and rating.
  • Agent response is accurate: The agent presents correct product details without HTML artifacts.

Constraints

  • The raw HTML must be parsed into structured data before reaching the LLM
  • The parsed output must include product name, price, and rating at minimum
  • The parsing logic must live inside the tool, not in the system prompt
  • The agent must not receive raw HTML tags in the tool output
Starter Code
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool

llm = ChatOpenAI(model="gpt-4o-mini")

@tool
def scrape_product(url: str) -> str:
    """Scrape product information from a URL."""
    # BUG: Returns raw HTML — the LLM wastes tokens parsing tags and often extracts wrong fields
    # TODO: Parse the HTML and return clean structured data
    raw_html = """
    <div class="product-page">
        <h1 class="product-title">Wireless Noise-Cancelling Headphones</h1>
        <span class="price">$249.99</span>
        <span class="original-price">$349.99</span>
        <div class="rating"><span class="stars">4.5</span> out of 5 (<span class="count">2,847</span> reviews)</div>
        <div class="availability in-stock">In Stock</div>
        <p class="description">Premium wireless headphones with active noise cancellation, 30-hour battery life, and comfortable over-ear design.</p>
        <div class="specs"><ul><li>Bluetooth 5.2</li><li>Weight: 250g</li><li>Driver: 40mm</li></ul></div>
        <div class="sidebar"><div class="ad">Buy now and save!</div><div class="related">Related products...</div></div>
    </div>
    """
    return raw_html

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a shopping assistant. Help users find product information."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, [scrape_product], prompt)
executor = AgentExecutor(agent=agent, tools=[scrape_product])

result = executor.invoke({"input": "Get me the details for the headphones at https://example.com/headphones"})
print(result["output"])
Open in Google Colab
Evaluation Criteria0/3