Agent Foundry
LangChain

RAG Pipeline

IntermediateTopic 12 of 22Open in Colab

RAG Pipeline

Retrieval-Augmented Generation (RAG) lets an LLM answer questions using your own data. Instead of relying solely on training knowledge, the model retrieves relevant documents and uses them as context for generating accurate, grounded responses.

The RAG Pattern

A RAG pipeline follows two steps:

  1. Retrieve — find relevant documents from a knowledge base
  2. Generate — pass those documents to the LLM as context for answering

This pattern grounds the model's responses in your actual data, reducing hallucination and keeping answers up to date.

Document Loaders

Document loaders bring data into LangChain from various sources. Each loader returns a list of Document objects with page_content and metadata:

from langchain_core.documents import Document
 
documents = [
    Document(page_content="Python is a high-level programming language known for readability.", metadata={"source": "python-intro"}),
    Document(page_content="Machine learning is a subset of AI that learns patterns from data.", metadata={"source": "ml-intro"}),
    Document(page_content="FastAPI is a modern Python web framework for building APIs.", metadata={"source": "fastapi-intro"}),
    Document(page_content="Neural networks are computing systems inspired by biological brains.", metadata={"source": "nn-intro"}),
    Document(page_content="Docker containers package applications with their dependencies.", metadata={"source": "docker-intro"}),
]

In production, you'd use loaders like TextLoader, PyPDFLoader, or WebBaseLoader to load from files and URLs.

Text Splitters

Documents are often too large to fit in context. Text splitters break them into smaller, overlapping chunks:

from langchain_text_splitters import RecursiveCharacterTextSplitter
 
splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=50,
)
 
long_doc = Document(
    page_content="Python was created by Guido van Rossum and released in 1991. "
    "It emphasizes code readability with significant whitespace. "
    "Python supports multiple programming paradigms including structured, "
    "object-oriented, and functional programming. "
    "It has a comprehensive standard library.",
    metadata={"source": "python-history"},
)
 
chunks = splitter.split_documents([long_doc])
for i, chunk in enumerate(chunks):
    print(f"Chunk {i}: {chunk.page_content[:80]}...")

RecursiveCharacterTextSplitter tries to split on natural boundaries (paragraphs, sentences) before falling back to character-level splits.

Embeddings

Embeddings convert text into numerical vectors that capture semantic meaning. Similar texts produce similar vectors:

from langchain_openai import OpenAIEmbeddings
 
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
 
vectors = embeddings.embed_documents([
    "Python programming language",
    "Machine learning algorithms",
])
print(f"Vector dimensions: {len(vectors[0])}")

Vector Stores

Vector stores index document embeddings for fast similarity search. LangChain's InMemoryVectorStore is the simplest option:

from langchain_core.vectorstores import InMemoryVectorStore
 
vector_store = InMemoryVectorStore(embedding=embeddings)
vector_store.add_documents(documents)

Now you can search for documents similar to a query:

results = vector_store.similarity_search("What is machine learning?", k=2)
for doc in results:
    print(f"[{doc.metadata['source']}] {doc.page_content}")

The Retriever Pattern

A retriever wraps a vector store with a standard interface that returns relevant documents for a query:

retriever = vector_store.as_retriever(search_kwargs={"k": 2})
 
docs = retriever.invoke("Tell me about Python web frameworks")
for doc in docs:
    print(f"[{doc.metadata['source']}] {doc.page_content}")

Retrievers are the standard way to plug document search into chains and agents.

Building a 2-Step RAG Chain

The classic RAG chain retrieves documents, formats them as context, and passes everything to the LLM:

from langchain.chat_models import init_chat_model
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
 
model = init_chat_model("gpt-4o-mini", model_provider="openai")
 
template = ChatPromptTemplate.from_messages([
    ("system", "Answer the question based only on the following context:\n\n{context}"),
    ("human", "{question}"),
])
 
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)
 
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | template
    | model
)
 
response = rag_chain.invoke("What is Python used for?")
print(response.content)

This chain retrieves relevant documents, formats them as a context string, and asks the LLM to answer based on that context.

Agentic RAG

Instead of a fixed chain, you can give an agent a retriever tool. The agent decides when and how to search:

from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
 
@tool
def search_knowledge_base(query: str) -> str:
    """Search the knowledge base for information. Use this to find answers to technical questions."""
    docs = retriever.invoke(query)
    return "\n\n".join(f"[{doc.metadata['source']}] {doc.page_content}" for doc in docs)
 
agent = create_react_agent(
    model=model,
    tools=[search_knowledge_base],
    prompt="You are a technical assistant. Use the knowledge base tool to find information before answering.",
)
 
from langchain_core.messages import HumanMessage
 
result = agent.invoke({
    "messages": [HumanMessage(content="Compare Python and Docker — what does each do?")]
})
print(result["messages"][-1].content)

Agentic RAG is more flexible — the agent can issue multiple queries, refine its search, and combine results from different retrievals.

2-Step RAG vs Agentic RAG

Aspect2-Step RAG ChainAgentic RAG
Control flowFixed: retrieve → generateDynamic: agent decides when to retrieve
Number of retrievalsAlways oneAgent can search multiple times
Query refinementNone — uses original queryAgent can rephrase and retry
ComplexitySimpler, predictableMore flexible, handles complex questions
LatencyLower (single retrieval)Higher (multiple LLM calls possible)

Key Takeaways

  • RAG grounds LLM responses in your own data by retrieving relevant documents as context
  • RecursiveCharacterTextSplitter breaks large documents into overlapping chunks
  • Embeddings convert text to vectors for semantic similarity search
  • Vector stores index documents; use .as_retriever() for a standard retrieval interface
  • A 2-step RAG chain follows a fixed retrieve → generate flow
  • Agentic RAG wraps the retriever as a tool, giving the agent control over when and how to search
  • Choose 2-step RAG for simplicity, agentic RAG for complex multi-step questions