Query Enhancement & HyDE
What You'll Build Today
Welcome to Day 45! By now, you have built a working RAG (Retrieval-Augmented Generation) system. You know how to chunk text, embed it, store it in a vector database, and retrieve it.
But there is a major weak point in your system: The User.
Users are messy. They make typos. They use slang. They ask vague questions like "it's broken" or use abbreviations you never accounted for. If your retrieval system relies solely on the user's raw input, your RAG system will fail to find the right documents, and your LLM will hallucinate an answer.
Today, we are building a Query Enhancement Engine. instead of trusting the user's input, we will use an LLM to "think" about what the user actually meant before we ever search our database.
Here is what you will master today:
* Query Rewriting: Why you should never pass raw user input directly to your search engine.
* Multi-Query Retrieval: How to cast a wider net by generating three different versions of the same question.
* HyDE (Hypothetical Document Embeddings): A fascinating technique where we hallucinate a fake answer to find the real answer.
* Robustness: Handling typos, abbreviations, and domain-specific jargon automatically.
Let's make your search engine smarter than the person using it.
---
The Problem
Let's look at the pain point. You have a database of technical documentation. Your embedding model is good, but it isn't magic. It relies on semantic similarity—meaning the words in the query need to be conceptually similar to the words in the document.
If a user types a sloppy query, the mathematical distance between their query and the correct document becomes too large, and you get zero results.
Look at this code. We have a document about "Authentication" and a user searching for "auth fail."
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
# 1. Setup our simple database
model = SentenceTransformer('all-MiniLM-L6-v2')
documents = [
"To fix authentication issues, ensure your API key is valid and not expired.",
"Database connection timeouts occur when the server is under heavy load.",
"The UI supports dark mode and light mode themes."
]
# 2. Embed the documents
doc_embeddings = model.encode(documents)
# 3. The Pain: A sloppy user query
user_query = "fix auth err" # Abbreviations, no context
# 4. Embed the query and search
query_embedding = model.encode([user_query])
scores = cosine_similarity(query_embedding, doc_embeddings)[0]
# 5. Print results
print(f"Query: '{user_query}'")
for doc, score in zip(documents, scores):
print(f"Score: {score:.4f} | Doc: {doc}")
The Result:
You will likely see relatively low scores. While "auth" and "authentication" are close, "err" and "issues" are distinct. If the user had typed "cant login," the score might be even lower because "login" doesn't appear in the text at all.
This is frustrating. You have the answer in your database, but because the user didn't use the exact right terminology, your system failed. You shouldn't have to force your users to speak like a computer. The computer should understand the user.
---
Let's Build It
We are going to fix this by inserting an intelligence layer between the user and the database. We will use an LLM (like GPT-3.5 or GPT-4) to rewrite the bad query into a good one.
Step 1: Setup the Environment
We need openai for the reasoning and sentence-transformers for the retrieval math.
import os
from openai import OpenAI
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
# Initialize clients
# Make sure you have your OPENAI_API_KEY set in your environment variables
client = OpenAI()
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
# Our Knowledge Base
documents = [
"To fix authentication issues, ensure your API key is valid and not expired.",
"Database connection timeouts occur when the server is under heavy load.",
"The UI supports dark mode and light mode themes.",
"Return policies allow for refunds within 30 days of purchase.",
"To reset your password, click the 'Forgot Password' link on the login page."
]
# Pre-calculate document embeddings to save time
doc_embeddings = embed_model.encode(documents)
def simple_search(query_text):
"""Helper function to run a search and return the best match"""
query_emb = embed_model.encode([query_text])
scores = cosine_similarity(query_emb, doc_embeddings)[0]
# Get best match index
best_idx = scores.argmax()
return documents[best_idx], scores[best_idx]
# Test the baseline (The Bad Query)
query = "pwd rst" # User means "password reset"
result, score = simple_search(query)
print(f"Original Query: {query}")
print(f"Best Match: {result}")
print(f"Score: {score:.4f}")
Run this. You might get lucky, or you might get the "return policies" document because "rst" is vague. It's unreliable.
Step 2: Query Rewriting
Now, let's ask the LLM to fix the spelling and expand abbreviations before we search.
def rewrite_query(original_query):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant that clarifies user search queries. Fix typos, expand abbreviations, and make the query grammatically correct. Output ONLY the rewritten query."},
{"role": "user", "content": f"Fix this search query: {original_query}"}
]
)
return response.choices[0].message.content.strip()
# Let's try it
bad_query = "pwd rst"
better_query = rewrite_query(bad_query)
print(f"Original: {bad_query}")
print(f"Rewritten: {better_query}")
# Search with the new query
result, score = simple_search(better_query)
print(f"New Score: {score:.4f}")
print(f"Found Doc: {result}")
Why this matters: The LLM knows that "pwd" usually means "password" and "rst" means "reset" based on its vast training data. By sending "password reset" to the embedding model, we get a much higher similarity score to the correct document.
Step 3: Multi-Query Expansion
Sometimes, rewriting isn't enough. A user might ask a question that could be answered in multiple ways. We want to generate variations of the query to cast a wider net.
def generate_multi_queries(original_query):
prompt = f"""
You are an AI language model assistant. Your task is to generate
3 different versions of the given user question to retrieve
relevant documents from a vector database.
By generating multiple perspectives on the user question,
your goal is to help the user overcome some of the limitations
of distance-based similarity search.
Provide these alternative questions separated by newlines.
Original question: {original_query}
"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
content = response.choices[0].message.content.strip()
# Split by newline to get a list
queries = content.split('\n')
return queries
# Test it
complex_query = "cant get into my acct"
variations = generate_multi_queries(complex_query)
print(f"Original: {complex_query}")
print("Variations:")
for v in variations:
print(f"- {v}")
# Perform search for each variation
res, sc = simple_search(v)
print(f" -> Found: {res[:40]}... (Score: {sc:.4f})")
Why this matters: One variation might focus on "login," another on "password," and another on "access." If your database has documents about any of those specific terms, one of these variations will catch it.
Step 4: HyDE (Hypothetical Document Embeddings)
This is an advanced concept, but the logic is simple.
def hyde_transform(query):
# 1. Generate a hypothetical answer
prompt = f"""
Please write a brief, hypothetical passage that answers the question below.
Do not include any preamble, just the passage.
Question: {query}
"""
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
hypothetical_answer = response.choices[0].message.content.strip()
return hypothetical_answer
# Let's try it
user_q = "how do i fix a 401 error"
fake_answer = hyde_transform(user_q)
print(f"User Query: {user_q}")
print(f"Hypothetical Answer (Hallucinated): {fake_answer}\n")
# Now we search using the FAKE answer, not the query
result, score = simple_search(fake_answer)
print(f"Best Real Document Found: {result}")
print(f"Score: {score:.4f}")
Why this matters: The user asked about a "401 error." The hypothetical answer likely contained words like "authentication," "credentials," "login," and "expired." These words are exactly what is inside our real document ("ensure your API key is valid"). We bridged the vocabulary gap by hallucinating context.
---
Now You Try
You have the building blocks. Now extend the system to handle specific edge cases.
Create a function clean_code_query(query). If a user pastes a raw python error message (e.g., Traceback (most recent call last)...), use the LLM to extract only the core error message and relevant library name before searching. Raw tracebacks confuse embedding models.
Modify the rewrite_query prompt to specifically act as a domain expert for a specific field (like Medicine or Law). Give it a query like "pt presents with elevated bp" and ensure it rewrites it to "patient presents with elevated blood pressure" before searching.
In the Multi-Query step, we ran 3 searches. This likely returned duplicate documents (the same document found by different queries). Write a function that takes the results from all 3 searches, removes duplicates, and returns the unique set of top documents.
---
Challenge Project: The HyDE vs. Standard Showdown
Your challenge is to build a comparison tool to prove HyDE works.
Requirements:* Method A: Standard Search (Embed Query -> Search).
* Method B: HyDE Search (Generate Fake Answer -> Embed Fake Answer -> Search).
"Xenomorph blood is highly acidic and can melt through starship hulls."
Example Hard Query:"defense mechanism against hull breach" (Note: "acid" isn't mentioned, but HyDE might hallucinate it).
Output format:Query: "defense mechanism against hull breach"
Standard Search Score: 0.45 (Doc: "Xenomorphs have an inner jaw...")
HyDE Search Score: 0.82 (Doc: "Xenomorph blood is highly acidic...")
Winner: HyDE
Hint:
When generating the HyDE response, instruct the LLM to be detailed. The more relevant vocabulary it hallucinates, the better the retrieval match.
---
What You Learned
Today you moved from "basic search" to "intelligent retrieval." You learned that the user's input is often the weakest link in a RAG system.
* Query Rewriting: You used an LLM to fix typos and expand abbreviations (pwd -> password).
* Multi-Query: You learned that asking the question in 3 different ways increases the odds of finding the answer.
* HyDE: You learned that sometimes the best way to find a document is to pretend you already know the answer and search with that.
Why This Matters:In a production application, users will not be prompt engineers. They will type "broken" or "help." If your system relies on raw embeddings, it will fail. These enhancement layers are the difference between a demo that looks cool and a product that actually helps people.
Tomorrow: Now that we have retrieved the best documents, how do we know if they are actually good? Tomorrow we dive into RAG Evaluation—measuring precision, recall, and faithfulness to ensure your AI isn't lying.