Agent Foundry
All Problems

#35. Summarization Memory

MediumMemoryCost Optimization

The Problem

Your assistant keeps every single message in conversation history. For short chats this is fine, but after 15+ turns the token count balloons—costs spike and eventually the context window overflows. Simply truncating old messages (like a sliding window) loses important context from earlier in the conversation. A better approach is summarization memory: condense older turns into a compact summary while keeping recent turns verbatim. Your job is to implement this hybrid strategy.

Examples

Example 1

Scenario: A 15-turn conversation about European history, ending with "Summarize everything we've discussed."

Current (bad) behavior: All 15 turns are kept verbatim. The prompt is 4,000+ tokens. On longer conversations the API throws a context_length_exceeded error.

Expected (good) behavior: Turns 1–10 are condensed into a ~200-token summary. Turns 11–15 are kept verbatim. The agent can still summarize the full conversation because key facts are in the summary.

Example 2

Scenario: User mentions their name in turn 1, discusses 12 unrelated topics, then asks "What's my name?"

Current (bad) behavior: Either the context overflows, or truncation drops turn 1 entirely and the name is lost.

Expected (good) behavior: The summarization step captures "User's name is Alice" in the running summary. Even though turn 1 was condensed, the agent can still recall the name.

Your Task

Implement summarization memory for the agent:

  • Split the history into two zones: a summary zone (older turns, condensed) and a recent zone (last N turns, kept verbatim).
  • When the summary zone grows past a threshold, use the LLM to generate a concise summary.
  • Prepend the running summary as context before the recent verbatim turns.
  • Ensure total token usage grows sub-linearly compared to full history.

Evaluation

Submissions are checked for the following:

  • Old turns replaced by summary: Messages beyond the recent window are condensed into a running summary.
  • Recent turns kept verbatim: The most recent N turns are preserved word-for-word alongside the summary.
  • Token usage reduced: The total prompt size grows sub-linearly compared to keeping full history.
  • Summary captures key facts: The running summary retains important facts and context from the condensed turns.

Constraints

  • Old messages must be replaced with a summary, not simply dropped
  • The summary must be generated by the LLM, not hardcoded
  • The most recent N turns must be kept verbatim alongside the summary
  • Total token usage should decrease compared to keeping full history
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: Keeps every message — hits token limits on long conversations
history = [SystemMessage(content="You are a helpful assistant.")]

def chat(user_input: str) -> str:
    history.append(HumanMessage(content=user_input))
    response = llm.invoke(history)
    history.append(response)
    return response.content

# Simulate a long conversation
topics = [
    "Tell me about the history of Rome.",
    "What were the main causes of its fall?",
    "How did the Byzantine Empire continue?",
    "Explain the role of the Catholic Church in medieval Europe.",
    "What was the Renaissance?",
    "Who were the key figures of the Renaissance?",
    "How did the printing press change society?",
    "What led to the Protestant Reformation?",
    "Describe the Age of Exploration.",
    "How did colonialism affect indigenous peoples?",
    "What sparked the French Revolution?",
    "Explain the Industrial Revolution.",
    "How did World War I start?",
    "What were the consequences of World War II?",
    "Summarize everything we've discussed so far.",
]

for i, topic in enumerate(topics):
    reply = chat(topic)
    print(f"Turn {i + 1}: {reply[:100]}...")

print(f"\nTotal messages in history: {len(history)}")
Open in Google Colab
Evaluation Criteria0/4