Agent Foundry
All Problems

#33. Context Window Truncation

EasyMemoryCost Optimization

The Problem

Your assistant keeps full conversation history and passes it to the model on every turn. This works fine for short conversations, but a power user has a 50-turn session and the agent either crashes with a context-length error or silently degrades as the model struggles with a prompt that is too long. You need to implement a sliding window that truncates older messages while always keeping the system prompt and the most recent exchanges.

Examples

Example 1

Scenario: A 50-turn conversation about space facts.

Current (bad) behavior: By turn 40 the API returns a context_length_exceeded error or the model starts repeating itself and ignoring recent instructions because the prompt is too large.

Expected (good) behavior: The agent keeps only the last N turns (e.g., 10) plus the system message. Turn 50 works exactly like turn 5—fast, coherent, and within token limits.

Example 2

Scenario: User shares their name in turn 1, then chats for 30 turns about unrelated topics.

Current (bad) behavior: The context window fills up and either crashes or the model cannot attend to the early name mention buried in a giant prompt.

Expected (good) behavior: The name from turn 1 is naturally truncated out of the window. The agent responds coherently to recent turns. If asked about the name, it acknowledges it no longer has that context rather than hallucinating.

Your Task

Implement a sliding window truncation strategy so the conversation stays within safe limits:

  • Before each model call, trim the history to keep only the system message and the most recent N user/assistant pairs.
  • Choose a sensible default for N (e.g., 10 turns).
  • Ensure the agent never crashes due to context length on conversations of any length.
  • The agent should still respond coherently using the retained context.

Evaluation

Submissions are checked for the following:

  • History stays bounded: The message list never exceeds the configured maximum regardless of conversation length.
  • System message preserved: The system message is never dropped during truncation.
  • Recent context retained: The agent can reference information from the most recent turns after truncation occurs.

Constraints

  • The sliding window must keep the most recent N turns
  • The system message must always be preserved regardless of truncation
  • You may not increase the model's context window limit
  • The agent must remain functional after truncation
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: History grows without bound — crashes or loses coherence on long conversations
history = [SystemMessage(content="You are a helpful assistant.")]

def chat(user_input: str) -> str:
    history.append(HumanMessage(content=user_input))
    response = llm.invoke(history)
    history.append(response)
    return response.content

# Simulate a 50-turn conversation
for i in range(50):
    reply = chat(f"Tell me fact number {i + 1} about space exploration.")
    print(f"Turn {i + 1}: {reply[:80]}...")

# By turn 50 the context may exceed limits or degrade
print(f"\nTotal messages in history: {len(history)}")
Open in Google Colab
Evaluation Criteria0/3