Tool Output Parser - Problems

#24. Tool Output Parser

MediumTool Calling

The Problem

Your shopping assistant agent has a scrape_product tool that fetches product pages. The problem is that it returns raw HTML to the agent. The LLM wastes tokens processing HTML tags, sometimes picks up ad copy or sidebar content as product info, and occasionally misreads prices or ratings. Your job is to parse the HTML inside the tool and return clean, structured data so the agent gets exactly the fields it needs.

Examples

Example 1

User input: Get me the details for the headphones at https://example.com/headphones

Current (bad) output: The tool returns a full HTML blob including <div class="sidebar">, ad copy, and related product links. The LLM tries to parse it inline and sometimes reports the original price ($349.99) instead of the current price ($249.99).

Expected (good) output: The tool returns structured data like: Name: Wireless Noise-Cancelling Headphones | Price: $249.99 (was $349.99) | Rating: 4.5/5 (2,847 reviews) | In Stock | Description: Premium wireless headphones with active noise cancellation... The agent presents this accurately.

Example 2

User input: Is this product well-reviewed?

Current (bad) output: The LLM finds a star rating buried in the HTML but misinterprets the review count because it picked up a number from the ad sidebar.

Expected (good) output: The tool's structured output clearly separates rating (4.5) and review count (2,847), so the agent correctly says: Yes, it has 4.5 out of 5 stars from 2,847 reviews.

Your Task

Modify the scrape_product tool to parse the raw HTML and extract structured fields.
Return at minimum: product name, current price, rating, review count, availability, and description.
Strip out irrelevant content (ads, sidebars, related products).
The parsing logic must live inside the tool function, not in the prompt.

Evaluation

Submissions are checked for the following:

No raw HTML in tool output: The tool returns clean structured data, not HTML tags.
Key fields are extracted: The parsed output includes at least product name, price, and rating.
Agent response is accurate: The agent presents correct product details without HTML artifacts.

Constraints

The raw HTML must be parsed into structured data before reaching the LLM
The parsed output must include product name, price, and rating at minimum
The parsing logic must live inside the tool, not in the system prompt
The agent must not receive raw HTML tags in the tool output

from langchain_openai import ChatOpenAI from langchain.agents import AgentExecutor, create_tool_calling_agent from langchain_core.prompts import ChatPromptTemplate from langchain_core.tools import tool llm = ChatOpenAI(model="gpt-4o-mini") @tool def scrape_product(url: str) -> str: """Scrape product information from a URL.""" # BUG: Returns raw HTML — the LLM wastes tokens parsing tags and often extracts wrong fields # TODO: Parse the HTML and return clean structured data raw_html = """ <div class="product-page"> <h1 class="product-title">Wireless Noise-Cancelling Headphones</h1> <span class="price">$249.99</span> <span class="original-price">$349.99</span> <div class="rating"><span class="stars">4.5</span> out of 5 (<span class="count">2,847</span> reviews)</div> <div class="availability in-stock">In Stock</div> <p class="description">Premium wireless headphones with active noise cancellation, 30-hour battery life, and comfortable over-ear design.</p> <div class="specs"><ul><li>Bluetooth 5.2</li><li>Weight: 250g</li><li>Driver: 40mm</li></ul></div> <div class="sidebar"><div class="ad">Buy now and save!</div><div class="related">Related products...</div></div> </div> """ return raw_html prompt = ChatPromptTemplate.from_messages([ ("system", "You are a shopping assistant. Help users find product information."), ("human", "{input}"), ("placeholder", "{agent_scratchpad}"), ]) agent = create_tool_calling_agent(llm, [scrape_product], prompt) executor = AgentExecutor(agent=agent, tools=[scrape_product]) result = executor.invoke({"input": "Get me the details for the headphones at https://example.com/headphones"}) print(result["output"])