Day 47 of 80

Conversational RAG

Phase 5: RAG Systems

What You'll Build Today

Welcome to Day 47! Today, we are going to bridge the gap between a "search engine" and a true "conversational assistant."

Up until now, your RAG (Retrieval-Augmented Generation) systems have been amnesiacs. They treat every single question as if it’s the very first time they’ve met you. If you ask about a product, and then ask "How much does it cost?", the system crashes and burns because it doesn't know what "it" refers to.

Today, we are building a Conversational RAG System. This is a chatbot that remembers what you just said and understands context, pronouns, and follow-up questions.

Here is what you will master today:

* Chat History Management: Why passing previous messages back to the AI is critical for continuity.

* Question Contextualization: Why you cannot simply pass a user's raw follow-up question to your database.

* The "Stand-alone" Query: How to use an LLM to rewrite a user's question into a search-friendly format before looking for answers.

This is the difference between a bot that frustrates users and a bot that feels intelligent. Let's get started.

---

The Problem

Let's look at the pain point. Imagine you have built a RAG system for a company that sells the "SuperVacuum 3000." You have a document store full of facts about the vacuum.

You want to have a conversation like this:

User: "Does the SuperVacuum 3000 have a warranty?"

AI: "Yes, it has a 2-year warranty."

User: "Does it cover water damage?"

In a standard RAG system, here is what happens in the code for that second question.

# A simple simulation of a standard RAG retrieval
def naive_retrieve(user_question, database):
    print(f"Searching database for: '{user_question}'")
    # In reality, this would be a vector search
    results = [doc for doc in database if user_question.lower() in doc.lower()]
    if not results:
        return "No relevant documents found."
    return results[0]

# Our knowledge base
knowledge_base = [
    "The SuperVacuum 3000 has a 2-year warranty.",
    "The warranty covers motor failure and battery defects.",
    "The warranty does NOT cover water damage or accidental drops."
]

# The conversation
print("--- QUESTION 1 ---")
q1 = "Does the SuperVacuum 3000 have a warranty?"
context1 = naive_retrieve(q1, knowledge_base)
# This works! It finds the document about the warranty.
print(f"Found context: {context1}")

print("\n--- QUESTION 2 ---")
q2 = "Does it cover water damage?"
context2 = naive_retrieve(q2, knowledge_base)
# This fails.
print(f"Found context: {context2}")

Output:

--- QUESTION 1 ---
Searching database for: 'Does the SuperVacuum 3000 have a warranty?'
Found context: The SuperVacuum 3000 has a 2-year warranty.

--- QUESTION 2 ---
Searching database for: 'Does it cover water damage?'
Found context: No relevant documents found.

The Pain:

The retrieval system failed on Question 2. Why? Because the user used the word "it".

The vector database does not know that "it" refers to the "SuperVacuum 3000 warranty." It just searches for documents containing "it" and "water damage," which might not match the specific phrasing in your database effectively, or worse, it matches irrelevant documents that happen to use the word "it."

To the database, Question 2 is completely isolated from Question 1. We need a way to tell the database, "Hey, when the user says 'it', they mean the warranty we just talked about."

---

Let's Build It

We are going to solve this using a technique called Question Condensation (or "Stand-alone Question Generation").

Before we search our database, we will ask an LLM to look at the chat history and the new question, and rewrite the new question to be crystal clear.

Step 1: Setup and Mock Database

First, let's set up our environment. We will use a simple list as our "database" and OpenAI for the logic.

Note: You will need your OpenAI API key for this.

import os
from openai import OpenAI

# Initialize the client
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

# Our "Vector Database" (Mocked for simplicity)
# In a real app, this would be ChromaDB or Pinecone
knowledge_base = [
    "The MoonBase Alpha is located in the Tycho crater.",
    "The MoonBase Alpha was built in 2045.",
    "The base commander is Sarah Jenkins.",
    "Oxygen implies a life support system running at 98% efficiency.",
    "The base has a hydroponic garden that grows potatoes and tomatoes."
]

def retrieve_documents(query):
    """
    A simple keyword search to simulate vector retrieval.
    """
    query = query.lower()
    results = []
    for doc in knowledge_base:
        # Simple keyword matching
        if any(word in doc.lower() for word in query.split() if len(word) > 3):
            results.append(doc)
    
    if not results:
        return "No relevant information found in the database."
    return "\n".join(results)

Step 2: Managing Chat History

We need a place to store the conversation. A simple Python list of dictionaries works perfectly.

# Initialize chat history
chat_history = []

def update_history(role, content):
    chat_history.append({"role": role, "content": content})

# Let's seed it with an initial interaction so we have something to reference
update_history("user", "Where is MoonBase Alpha located?")
update_history("assistant", "MoonBase Alpha is located in the Tycho crater.")

print("Current History:")
for msg in chat_history:
    print(f"{msg['role']}: {msg['content']}")

Step 3: The Standalone Question Generator

This is the most important part of today. We need a function that takes the chat_history and the user_question, and asks the LLM to rewrite the question.

If the user asks "Who runs it?", and history talks about the base, the LLM should rewrite this to "Who runs MoonBase Alpha?"

def generate_standalone_question(history, user_question):
    print("--- Rewriting Question ---")
    
    # Create a prompt that explains the task to the LLM
    system_prompt = """
    Given the following conversation history and a follow-up question, 
    rephrase the follow-up question to be a standalone question.
    
    The standalone question must contain all necessary context to be understood 
    without looking at the history. Replace pronouns (it, he, she, that) with 
    the specific names or nouns they refer to.
    
    Do NOT answer the question. Just rewrite it.
    """
    
    # Format history for the prompt
    history_text = "\n".join([f"{msg['role']}: {msg['content']}" for msg in history])
    
    user_prompt = f"""
    Chat History:
    {history_text}
    
    Follow-up input: {user_question}
    
    Standalone question:
    """
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0.1 # Low temperature for precision
    )
    
    return response.choices[0].message.content

Step 4: The Full Conversational RAG Loop

Now we combine everything.

Take user input.

Generate standalone question (using history).

Retrieve documents (using the standalone question).

Generate final answer (using original question + docs + history).

def conversational_rag(user_input):
    global chat_history
    
    # 1. Rewrite the question
    standalone_question = generate_standalone_question(chat_history, user_input)
    print(f"Original: '{user_input}'")
    print(f"Rewritten: '{standalone_question}'")
    
    # 2. Retrieve using the REWRITTEN question
    context = retrieve_documents(standalone_question)
    print(f"Context found: {context}")
    
    # 3. Generate the final answer
    # We provide the context and the history to the final generation model
    system_prompt = "You are a helpful assistant for MoonBase Alpha. Use the provided context to answer questions."
    
    messages = [
        {"role": "system", "content": system_prompt},
        # We can optionally include history here too, but the context is the most important part now
        {"role": "user", "content": f"Context: {context}\n\nQuestion: {user_input}"}
    ]
    
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages
    )
    
    answer = response.choices[0].message.content
    
    # 4. Update history
    update_history("user", user_input)
    update_history("assistant", answer)
    
    return answer

Step 5: Testing the Flow

Let's run a conversation that would have failed with a standard RAG system.

# Reset history for a clean start
chat_history = []

print(">>> Turn 1")
answer1 = conversational_rag("When was MoonBase Alpha built?")
print(f"AI: {answer1}\n")

print(">>> Turn 2")
# Here is the ambiguous question!
answer2 = conversational_rag("Who is the commander there?")
print(f"AI: {answer2}\n")

print(">>> Turn 3")
# Another ambiguous question referring to the previous answer
answer3 = conversational_rag("Does she like potatoes?") 
# Note: Our DB knows about potatoes, but not if Sarah likes them. 
# The rewrite should clarify "she" is Sarah Jenkins.
print(f"AI: {answer3}\n")

Expected Output Analysis:

* Turn 1: Rewritten might just be the same. Retrieves date.

* Turn 2: Input "Who is the commander there?" -> Rewritten "Who is the commander of MoonBase Alpha?" -> Retrieves "Sarah Jenkins".

Turn 3: Input "Does she like potatoes?" -> Rewritten "Does Sarah Jenkins like potatoes?" -> Retrieves info about potatoes (but likely won't find preference info). The key is that it tried* to search for Sarah, not just "she".

---

Now You Try

You have a working conversational memory. Now, let's make it robust.

1. The "Clean Start" Button

Currently, chat_history grows forever. If you change topics completely, the old history might confuse the rewriter.

* Task: Create a function clear_memory() that empties the list.

* Extension: Add logic to conversational_rag so that if the user says "Reset" or "New topic", it clears memory automatically.

2. Source Attribution

It is important to know why the AI gave an answer.

* Task: Modify the retrieve_documents function (or the mock data) to include an ID or Title for each fact.

* Task: Update the final prompt to ask the AI to "Cite the source ID" in its final answer.

3. History Limiter

LLMs have a limit on how much text they can process (context window). If your chat history is 100 messages long, it will crash or get expensive.

* Task: Modify generate_standalone_question to only look at the last 3 turns of conversation (User-AI, User-AI, User-AI).

* Hint: You can slice a list in Python using chat_history[-6:].

---

Challenge Project: The Standalone Question Generator

Your challenge is to extract the logic we built today into a robust, reusable class. This component is often the first piece of a professional RAG pipeline.

Requirements:

Create a class QueryRefiner.

It should have a method refine(conversation_history, new_query).

It must handle three specific edge cases:

* Pronouns: "It", "He", "They" -> specific names.

* Temporal references: "What about last year?" -> "What about [Current Year - 1]?" (You might need to inject the current date into the system prompt!).

* Implicit context: If the user just types "And the price?", it should rewrite to "What is the price of [Product discussed]?"

Return the refined string.

Example Input/Output: History:* [User: "Tell me about the iPhone 15.", AI: "It was released in September."] Input:* "Is it waterproof?" Output:* "Is the iPhone 15 waterproof?" History:* [User: "Who is the CEO of Tesla?", AI: "Elon Musk."] Input:* "How old is he?" Output:* "How old is Elon Musk?" Hints:

* Your system prompt is your most powerful tool here. Be very explicit in your instructions to the LLM.

* You don't need a database for this challenge—just the LLM logic to rewrite the text.

---

What You Learned

Today you moved from "Stateless" AI to "Stateful" AI.

* History-Aware Retrieval: You learned that you cannot search a database with raw chat inputs.

* Question Condensation: You built a mechanism to rewrite ambiguous questions into standalone queries.

* Context Injection: You saw how passing previous turns helps the LLM understand "it" and "that."

Why This Matters:

In the real world, users never speak in perfect, standalone search queries. They speak in flows. "Show me the red shoes." "Do you have them in size 10?" "What about blue?"

Without the techniques you learned today, your AI would fail on the second and third sentences. With these techniques, you can build a helpful shopping assistant, a tech support bot, or a legal research tool.

Tomorrow:

We are going to look at Advanced RAG Patterns. What happens when you have multiple databases? Or when the question requires math, not just reading? We will explore "Routing" to send questions to the right expert tool.

← Day 46 Day 48 →