Agent Foundry
All Problems

#36. Selective Memory Recall

MediumMemory

The Problem

Your assistant has access to a knowledge base of 100 stored facts about the user and their environment. The current implementation dumps every single fact into the system prompt on every query, regardless of relevance. This wastes tokens, increases cost, and can confuse the model with irrelevant information. The fix is to implement selective retrieval — given the user's current query, find and inject only the most relevant facts.

Examples

Example 1

User input: Who is my manager?

Current (bad) behavior: All 100 facts are shoved into the prompt. The model has to sift through trivia about miscellaneous topics to find "Jordan's manager is named Priya."

Expected (good) behavior: Only 3-5 relevant facts are retrieved (e.g., facts about Jordan's manager, team, and company). The answer is fast, cheap, and accurate: "Your manager is Priya."

Example 2

User input: What programming language do I like?

Current (bad) behavior: The entire 100-fact block is included. Token cost is high and latency increases.

Expected (good) behavior: The retrieval step finds "Jordan's favorite programming language is Python" and perhaps a couple of related facts. The response is concise: "You prefer Python!"

Example 3

User input: Tell me about DataFlow.

Current (bad) behavior: All facts dumped. The model might mix up relevant and irrelevant information.

Expected (good) behavior: Facts about DataFlow are retrieved: its founding year, what it builds, its location. The agent gives a focused summary.

Your Task

Replace the brute-force fact injection with semantic retrieval:

  • Convert each fact into an embedding and store them in an in-memory vector store.
  • On each query, embed the user's question and retrieve the top-K most relevant facts by similarity.
  • Inject only those relevant facts into the prompt context.
  • The agent should answer just as accurately but with far fewer tokens in the prompt.

Evaluation

Submissions are checked for the following:

  • Retrieves relevant facts: Only facts semantically related to the query are injected into the prompt.
  • Does not dump all facts: The prompt contains a small subset, not the entire knowledge base.
  • Answers correctly: The agent provides accurate answers based on the retrieved facts.

Constraints

  • The agent must not dump all stored facts into the prompt
  • Retrieval must be based on semantic relevance to the current query
  • The memory store must support at least 100 facts
  • Only the top-K most relevant facts should be injected into context
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(model="gpt-4o-mini")

# A knowledge base of 100 facts
facts = [
    "The user's name is Jordan.",
    "Jordan works at a startup called DataFlow.",
    "Jordan's favorite programming language is Python.",
    "The company DataFlow builds ETL pipelines.",
    "Jordan has a meeting with the VP of Engineering every Monday.",
    "The office is located in San Francisco.",
    "Jordan prefers dark mode in all applications.",
    "The team uses Slack for communication.",
    "Jordan's manager is named Priya.",
    "The company was founded in 2021.",
] + [f"Fact {i}: Random trivia item number {i} about miscellaneous topics." for i in range(11, 101)]

# BUG: Dumps ALL 100 facts into the prompt regardless of the query
facts_block = "\n".join(facts)

def chat(user_input: str) -> str:
    messages = [
        SystemMessage(content=f"You are a helpful assistant. Here is everything you know:\n{facts_block}"),
        HumanMessage(content=user_input),
    ]
    response = llm.invoke(messages)
    return response.content

print(chat("Who is my manager?"))
print(chat("What programming language do I like?"))
print(chat("Tell me about DataFlow."))
Open in Google Colab
Evaluation Criteria0/3