Agent Foundry
All Problems

#88. Multi-Agent Debugging

HardMulti-AgentEvaluation

The Problem

You have a four-agent pipeline (collector → cleaner → analyzer → reporter) that processes data and generates executive reports. The final reports sometimes contain incorrect conclusions, but with no observability, you cannot tell which agent introduced the error. Did the collector gather wrong data? Did the cleaner corrupt something? Did the analyzer draw wrong conclusions? Or did the reporter misrepresent the analysis? Your task is to add tracing and logging to each agent so you can inspect every agent's input and output and pinpoint where errors originate.

Examples

Example 1

User input: Topic: global renewable energy adoption trends

Current (bad) output: The final report claims solar energy adoption is declining (incorrect), but there is no way to tell where this error originated in the four-agent chain.

Expected (good) output: With tracing enabled, the logs show:

  • Collector gathered correct data (solar growing at 25% YoY).
  • Cleaner correctly normalized the data.
  • Analyzer incorrectly interpreted the growth as decline.
  • Reporter faithfully reported the analyzer's wrong conclusion.

The trace pinpoints the analyzer as the faulty agent.

Example 2

User input: Topic: remote work adoption in tech companies

Current (bad) output: The report contains statistics that don't match publicly available data, with no debugging trail.

Expected (good) output: The trace reveals the collector introduced fabricated statistics that propagated through the rest of the pipeline. Each subsequent agent processed the bad data correctly — the root cause was the first stage.

Your Task

Add observability to the starter code so that:

  • Every agent's input and output are logged with the agent's name.
  • The trace makes it easy to identify which agent introduced an error.
  • Each trace entry includes the agent name, input received, and output produced.
  • The agents' core logic and prompts remain unchanged — only add tracing.

Evaluation

Submissions are checked for the following:

  • Tracing added to each agent: Each agent's input and output are logged or traced.
  • Faulty agent identifiable: The trace output makes it possible to identify which specific agent produced incorrect output.
  • Trace includes key information: Each trace entry includes the agent name, input received, and output produced.
  • Core logic unchanged: The agents' core prompts and logic are not modified — only observability is added.

Constraints

  • You must add tracing or logging to each agent's input and output
  • The tracing must make it possible to identify which agent produced incorrect output
  • You may not change the agents' core logic to fix the bug — only add observability
  • The trace output must include agent name, input received, and output produced
Starter Code
from crewai import Agent, Task, Crew, Process
from crewai import LLM

llm = LLM(model="gpt-4o-mini")

collector = Agent(
    role="Data Collector",
    goal="Collect raw data points about a topic",
    backstory="You gather raw information from various sources.",
    llm=llm,
)

cleaner = Agent(
    role="Data Cleaner",
    goal="Clean and normalize the collected data",
    backstory="You remove duplicates, fix formatting, and standardize data.",
    llm=llm,
)

analyzer = Agent(
    role="Data Analyzer",
    goal="Analyze cleaned data and extract insights",
    backstory="You find patterns and trends in structured data.",
    llm=llm,
)

reporter = Agent(
    role="Report Generator",
    goal="Generate a final report from the analysis",
    backstory="You create clear, executive-ready reports.",
    llm=llm,
)

collect_task = Task(
    description="Collect key data points about {topic}",
    expected_output="A list of raw data points",
    agent=collector,
)

clean_task = Task(
    description="Clean and normalize the collected data",
    expected_output="Cleaned, standardized data",
    agent=cleaner,
)

analyze_task = Task(
    description="Analyze the cleaned data for trends and insights",
    expected_output="Key insights and trend analysis",
    agent=analyzer,
)

report_task = Task(
    description="Generate a final executive report",
    expected_output="A polished executive report",
    agent=reporter,
)

# BUG: The pipeline produces a wrong final report but it's unclear which agent erred
# TODO: Add tracing/logging to each agent to identify the faulty one
crew = Crew(
    agents=[collector, cleaner, analyzer, reporter],
    tasks=[collect_task, clean_task, analyze_task, report_task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={"topic": "global renewable energy adoption trends"})
print(result)
Open in Google Colab
Evaluation Criteria0/4