Agent Foundry
All Problems

#41. Memory Budget Manager

HardMemoryCost Optimization

The Problem

Your assistant stores every user message and every response in a memory list. After 30 turns, the memory contains thousands of tokens and keeps growing. There is no budget, no eviction, and no prioritization. Eventually the prompt exceeds the context window and the agent fails. Simply truncating from the front (sliding window) loses potentially important information. You need a budget manager that tracks token usage, scores items by relevance/recency, and evicts the least valuable items when the budget is exceeded.

Examples

Example 1

Scenario: 30-turn conversation about world countries. Memory has 60+ items.

Current (bad) behavior: All 60 items are in the prompt. By turn 25 the token count exceeds the model's context window and the API errors out.

Expected (good) behavior: Memory stays under the token budget (e.g., 2000 tokens). Older, less relevant items are evicted. Recent and frequently referenced items stay.

Example 2

Scenario: User pins an important note: "My API key expires on March 1st." Then 20 turns of unrelated chat follow.

Current (bad) behavior: The API key note is buried among 40+ items and may be truncated if a sliding window is used.

Expected (good) behavior: The API key note is marked as pinned/high-priority and survives eviction. Less important items are evicted instead.

Example 3

Scenario: Token budget is set to 1500 tokens. The agent has 2000 tokens of memory.

Current (bad) behavior: No enforcement — the full 2000 tokens go into the prompt.

Expected (good) behavior: The budget manager evicts ~500 tokens worth of low-scoring items before the LLM call.

Your Task

Implement a memory budget manager:

  • Track the approximate token count of each memory item.
  • Set a configurable token budget (e.g., 2000 tokens).
  • Before each LLM call, check if total memory exceeds the budget.
  • If over budget, evict items using a scoring function based on recency and relevance.
  • Support pinning important items so they are never evicted.

Evaluation

Submissions are checked for the following:

  • Token budget enforced: Total memory token count never exceeds the configured budget.
  • Eviction is relevance-based: Least relevant or oldest items are evicted first, not random ones.
  • Important items protected: High-priority or pinned items survive eviction.
  • Budget enforced automatically: The check and eviction happen before each LLM call without manual intervention.

Constraints

  • Total memory must stay under a configurable token budget
  • Eviction must be based on relevance or recency, not random
  • The budget check and eviction must happen automatically before each call
  • High-importance items should be protected from eviction
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: Memory grows without bound — no budget enforcement
memory_items = []

def add_memory(item: str):
    memory_items.append(item)

def chat(user_input: str) -> str:
    add_memory(f"User: {user_input}")
    memory_block = "\n".join(memory_items)
    messages = [
        SystemMessage(content=f"You are a helpful assistant. Memory:\n{memory_block}"),
        HumanMessage(content=user_input),
    ]
    response = llm.invoke(messages)
    add_memory(f"Assistant: {response.content}")
    return response.content

# Simulate many interactions that fill up memory
for i in range(30):
    topic = f"Tell me an interesting fact about country number {i + 1} in the world."
    reply = chat(topic)
    print(f"Turn {i + 1} ({len(memory_items)} items): {reply[:60]}...")

print(f"\nTotal memory items: {len(memory_items)}")
total_chars = sum(len(item) for item in memory_items)
print(f"Total characters in memory: {total_chars}")
Open in Google Colab
Evaluation Criteria0/4