Agent Foundry
All Problems

#96. Token Budget Manager

MediumCost Optimization

The Problem

Your research agent breaks topics into 10 aspects and calls the LLM for each one, then synthesizes a final report—11 LLM calls per run with zero cost awareness. In production, a single run can easily exceed your per-request budget, and runaway loops have no guardrail. Your job is to add a token budget manager that tracks cumulative cost across all LLM calls in a run, enforces a $0.05 limit, and stops gracefully when the budget is exhausted.

Examples

Example 1

User input: The future of renewable energy

Current (bad) output: All 10 research calls plus the synthesis call execute regardless of cost. Total: ~$0.12 with no tracking or limit.

Expected (good) output: The agent researches aspects 1–6, tracks cost after each call, and at aspect 7 detects the budget is nearly exhausted. It stops, synthesizes what it has, and returns: "[Budget: $0.048 / $0.050 used — 6 of 10 aspects completed] Summary of findings on solar, wind, hydro, geothermal, nuclear fusion, and grid storage…"

Example 2

User input: Explain quantum computing

Current (bad) output: 11 LLM calls with no cost tracking. The run could cost $0.15 on a complex topic.

Expected (good) output: The agent completes as many research calls as the budget allows, logs cumulative cost after each call, and returns a partial report when the limit is reached. Each call's cost is visible: "Call 1: $0.006 (cumulative: $0.006)… Call 5: $0.008 (cumulative: $0.041)… Budget limit reached."

Your Task

Modify the starter code so that:

  • Each LLM call's token usage is tracked using the actual API response metadata.
  • Cumulative cost is calculated across the entire run using per-token pricing.
  • The agent stops making new calls once the $0.05 budget is reached.
  • When stopped, the agent returns a partial result summarizing what it gathered so far.

Evaluation

Submissions are checked for the following:

  • Enforces budget limit: The agent stops making LLM calls once the $0.05 budget is reached.
  • Tracks token usage: Token usage is tracked per call and cumulatively across the entire run.
  • Graceful stop at limit: When budget is exhausted, the agent returns a partial result rather than crashing.
  • Uses actual token counts: Cost calculations use real token counts from API responses, not estimates.

Constraints

  • Each agent run must enforce a $0.05 budget limit
  • Token usage must be tracked per call and cumulatively across the run
  • The agent must stop gracefully when the budget is exhausted, returning a partial result
  • The budget tracker must use actual token counts from API responses, not estimates
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: No cost tracking — the agent can make unlimited LLM calls with no budget awareness
# TODO: Add token budget tracking with a $0.05 per-run limit
def research_and_summarize(topic: str) -> str:
    sources = []
    for i in range(10):
        response = llm.invoke([
            SystemMessage(content=f"Research aspect {i+1} of this topic. Provide detailed findings."),
            HumanMessage(content=topic),
        ])
        sources.append(response.content)

    summary = llm.invoke([
        SystemMessage(content="Combine these research findings into a comprehensive summary."),
        HumanMessage(content="\n\n".join(sources)),
    ])
    return summary.content

result = research_and_summarize("The future of renewable energy")
print(result)
Open in Google Colab
Evaluation Criteria0/4