Agent Foundry
All Problems

#53. Hybrid Search

MediumRAG

The Problem

Your product catalog RAG agent uses vector similarity search to find relevant documents. This works well for natural language queries like "What tools do you offer for calibration?" but fails for exact term lookups like product codes (SKU-1148) and IDs. Vector embeddings capture semantic meaning, not exact strings—so searching for SKU-1148 might return a completely different product whose description is more semantically similar. Your job is to implement hybrid search that combines keyword-based matching (for exact terms) with vector search (for semantic understanding).

Examples

Example 1

User input: What is the price of SKU-1148?

Current (bad) output: Returns information about a different product because vector similarity matched the wrong document—the embedding for "SKU-1148" doesn't reliably map to the right product.

Expected (good) output: Product SKU-1148 (Precision calibration tool) is priced at $129.00.

Example 2

User input: Tell me about your calibration products.

Current (bad) output: (This works okay with vector search since it's a semantic query.)

Expected (good) output: We offer a Precision calibration tool (SKU-1148) priced at $129.00 with an accuracy of ±0.01mm.

Example 3

User input: Compare SKU-7291 and SKU-3364.

Current (bad) output: Retrieves wrong products because vector search doesn't reliably match exact SKU codes.

Expected (good) output: SKU-7291 is an Industrial-grade widget ($45.99, 2.3kg) and SKU-3364 is a Heavy-duty mounting bracket ($18.50, stainless steel).

Your Task

Implement a hybrid search pipeline that:

  • Adds a keyword-based search method (e.g., BM25) alongside the existing vector search.
  • Reliably finds documents containing exact terms like product codes and IDs.
  • Preserves semantic search capabilities for natural language queries.
  • Merges and deduplicates results from both search methods.

Evaluation

Submissions are checked for the following:

  • Implements keyword search: A keyword-based search method (e.g., BM25) is implemented alongside vector search.
  • Finds exact term matches: Product codes, SKUs, and IDs are reliably found via keyword matching.
  • Semantic search still works: Natural language queries still return relevant results via vector similarity.
  • Merges results from both methods: Results from keyword and vector search are combined and deduplicated.

Constraints

  • The retriever must combine both keyword-based and vector-based search
  • Exact matches on product codes and IDs must be found reliably
  • Semantic search must still work for natural language queries
  • Results from both search methods must be merged and deduplicated
Starter Code
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

documents = [
    Document(page_content="Product SKU-7291: Industrial-grade widget. Price: $45.99. Weight: 2.3kg."),
    Document(page_content="Product SKU-1148: Precision calibration tool. Price: $129.00. Accuracy: ±0.01mm."),
    Document(page_content="Product SKU-3364: Heavy-duty mounting bracket. Price: $18.50. Material: Stainless steel."),
    Document(page_content="All products come with a 2-year warranty and free shipping on orders over $100."),
    Document(page_content="Product SKU-9902: Digital multimeter with auto-ranging. Price: $89.99. Display: LCD."),
]

vectorstore = FAISS.from_documents(documents, embeddings)

# BUG: Vector-only search — misses exact keyword matches like product codes
retriever = vectorstore.as_retriever()

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer based on context.\n\nContext: {context}"),
    ("human", "{question}"),
])

def ask(question: str) -> str:
    docs = retriever.invoke(question)
    context = "\n".join([doc.page_content for doc in docs])
    chain = prompt | llm
    result = chain.invoke({"context": context, "question": question})
    return result.content

# Vector search may not find exact SKU match
print(ask("What is the price of SKU-1148?"))
Open in Google Colab
Evaluation Criteria0/4