Agent Foundry
All Problems

#56. Agentic RAG

HardRAGOrchestration

The Problem

Your current RAG pipeline is a rigid sequence: retrieve, then answer. Every query—no matter how trivial—goes through the vector store. Ask "What is 2 + 2?" and it retrieves irrelevant enterprise documents before answering. Ask a complex question and it retrieves once, gets a partial result, and gives an incomplete answer because it can't decide to search again with a better query. The pipeline lacks agency: the ability to decide when to retrieve, what query to use, and whether the results are good enough. Your job is to build an agentic RAG system where the agent makes these decisions dynamically.

Examples

Example 1

User input: What is the enterprise API rate limit?

Current (bad) output: Retrieves documents and answers, but the pipeline has no ability to evaluate if the retrieval was sufficient—it always returns whatever the first search yields.

Expected (good) output: The agent decides this question needs retrieval, searches the knowledge base, finds the rate limit document, and answers: The enterprise API rate limit is 10,000 requests per minute.

Example 2

User input: What is 2 + 2?

Current (bad) output: Retrieves irrelevant enterprise documents, then tries to answer "2 + 2" from them—wasting time and potentially confusing the answer with irrelevant context.

Expected (good) output: The agent recognizes this is a simple math question, skips retrieval entirely, and answers: 4.

Example 3

User input: What security certifications does the company have, and in which data regions can I store data?

Current (bad) output: Retrieves one document that partially answers the question, misses the other.

Expected (good) output: The agent retrieves, evaluates the results, sees it needs more information, retrieves again with a refined query, and combines: The company holds SOC 2 Type II certification (renewed January 2024). Data residency options include US-East, EU-West, and AP-Southeast regions.

Your Task

Build an agentic RAG system where the agent:

  • Classifies incoming queries to decide whether retrieval is needed.
  • Formulates its own search queries (not just passing the raw user question).
  • Evaluates retrieved results and decides if they're sufficient or if another retrieval round is needed.
  • Answers simple questions directly without unnecessary retrieval.

Evaluation

Submissions are checked for the following:

  • Decides when to retrieve: The agent intelligently decides whether a query requires document retrieval or can be answered directly.
  • Formulates search queries: The agent generates appropriate search queries rather than passing raw user input.
  • Evaluates retrieval sufficiency: The agent checks whether retrieved results are sufficient and can retrieve again if needed.
  • Routes queries correctly: Knowledge questions go to retrieval while simple questions are answered directly.

Constraints

  • The agent must decide whether retrieval is needed for each query
  • The agent must formulate its own search queries
  • The agent must evaluate retrieved results and decide if more retrieval is needed
  • Simple factual questions the agent knows should be answered directly without retrieval
Starter Code
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langgraph.graph import StateGraph, START, END
from typing import TypedDict

llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

documents = [
    Document(page_content="Our enterprise plan includes SSO, audit logs, and 99.9% SLA."),
    Document(page_content="The API rate limit for enterprise customers is 10,000 requests per minute."),
    Document(page_content="Data residency options: US-East, EU-West, and AP-Southeast regions."),
    Document(page_content="Custom integrations require a minimum 12-month contract."),
    Document(page_content="Our SOC 2 Type II certification was renewed in January 2024."),
]

vectorstore = FAISS.from_documents(documents, embeddings)
retriever = vectorstore.as_retriever()

class State(TypedDict):
    question: str
    context: str
    answer: str

# BUG: Always retrieves regardless of query type — no decision logic
def retrieve(state: State) -> State:
    docs = retriever.invoke(state["question"])
    context = "\n".join([doc.page_content for doc in docs])
    return {"context": context}

def answer(state: State) -> State:
    response = llm.invoke(
        f"Context: {state['context']}\nQuestion: {state['question']}"
    )
    return {"answer": response.content}

graph = StateGraph(State)
graph.add_node("retrieve", retrieve)
graph.add_node("answer", answer)
graph.add_edge(START, "retrieve")
graph.add_edge("retrieve", "answer")
graph.add_edge("answer", END)

app = graph.compile()

# This needs retrieval
print(app.invoke({"question": "What is the enterprise API rate limit?"})["answer"])
# This does NOT need retrieval — agent should answer directly
print(app.invoke({"question": "What is 2 + 2?"})["answer"])
Open in Google Colab
Evaluation Criteria0/4