Day 47 of 80

Conversational RAG

Phase 5: RAG Systems

What You'll Build Today

Welcome to Day 47! Today, we are going to bridge the gap between a "search engine" and a true "conversational assistant."

Up until now, your RAG (Retrieval-Augmented Generation) systems have been amnesiacs. They treat every single question as if it’s the very first time they’ve met you. If you ask about a product, and then ask "How much does it cost?", the system crashes and burns because it doesn't know what "it" refers to.

Today, we are building a Conversational RAG System. This is a chatbot that remembers what you just said and understands context, pronouns, and follow-up questions.

Here is what you will master today:

* Chat History Management: Why passing previous messages back to the AI is critical for continuity.

* Question Contextualization: Why you cannot simply pass a user's raw follow-up question to your database.

* The "Stand-alone" Query: How to use an LLM to rewrite a user's question into a search-friendly format before looking for answers.

This is the difference between a bot that frustrates users and a bot that feels intelligent. Let's get started.

---

The Problem

Let's look at the pain point. Imagine you have built a RAG system for a company that sells the "SuperVacuum 3000." You have a document store full of facts about the vacuum.

You want to have a conversation like this:

  • User: "Does the SuperVacuum 3000 have a warranty?"
  • AI: "Yes, it has a 2-year warranty."
  • User: "Does it cover water damage?"
  • In a standard RAG system, here is what happens in the code for that second question.

    # A simple simulation of a standard RAG retrieval
    

    def naive_retrieve(user_question, database):

    print(f"Searching database for: '{user_question}'")

    # In reality, this would be a vector search

    results = [doc for doc in database if user_question.lower() in doc.lower()]

    if not results:

    return "No relevant documents found."

    return results[0]

    # Our knowledge base

    knowledge_base = [

    "The SuperVacuum 3000 has a 2-year warranty.",

    "The warranty covers motor failure and battery defects.",

    "The warranty does NOT cover water damage or accidental drops."

    ]

    # The conversation

    print("--- QUESTION 1 ---")

    q1 = "Does the SuperVacuum 3000 have a warranty?"

    context1 = naive_retrieve(q1, knowledge_base)

    # This works! It finds the document about the warranty.

    print(f"Found context: {context1}")

    print("\n--- QUESTION 2 ---")

    q2 = "Does it cover water damage?"

    context2 = naive_retrieve(q2, knowledge_base)

    # This fails.

    print(f"Found context: {context2}")

    Output:
    --- QUESTION 1 ---
    

    Searching database for: 'Does the SuperVacuum 3000 have a warranty?'

    Found context: The SuperVacuum 3000 has a 2-year warranty.

    --- QUESTION 2 ---

    Searching database for: 'Does it cover water damage?'

    Found context: No relevant documents found.

    The Pain:

    The retrieval system failed on Question 2. Why? Because the user used the word "it".

    The vector database does not know that "it" refers to the "SuperVacuum 3000 warranty." It just searches for documents containing "it" and "water damage," which might not match the specific phrasing in your database effectively, or worse, it matches irrelevant documents that happen to use the word "it."

    To the database, Question 2 is completely isolated from Question 1. We need a way to tell the database, "Hey, when the user says 'it', they mean the warranty we just talked about."

    ---

    Let's Build It

    We are going to solve this using a technique called Question Condensation (or "Stand-alone Question Generation").

    Before we search our database, we will ask an LLM to look at the chat history and the new question, and rewrite the new question to be crystal clear.

    Step 1: Setup and Mock Database

    First, let's set up our environment. We will use a simple list as our "database" and OpenAI for the logic.

    Note: You will need your OpenAI API key for this.
    import os
    

    from openai import OpenAI

    # Initialize the client

    client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

    # Our "Vector Database" (Mocked for simplicity) # In a real app, this would be ChromaDB or Pinecone

    knowledge_base = [

    "The MoonBase Alpha is located in the Tycho crater.",

    "The MoonBase Alpha was built in 2045.",

    "The base commander is Sarah Jenkins.",

    "Oxygen implies a life support system running at 98% efficiency.",

    "The base has a hydroponic garden that grows potatoes and tomatoes."

    ]

    def retrieve_documents(query):

    """

    A simple keyword search to simulate vector retrieval.

    """

    query = query.lower()

    results = []

    for doc in knowledge_base:

    # Simple keyword matching

    if any(word in doc.lower() for word in query.split() if len(word) > 3):

    results.append(doc)

    if not results:

    return "No relevant information found in the database."

    return "\n".join(results)

    Step 2: Managing Chat History

    We need a place to store the conversation. A simple Python list of dictionaries works perfectly.

    # Initialize chat history
    

    chat_history = []

    def update_history(role, content):

    chat_history.append({"role": role, "content": content})

    # Let's seed it with an initial interaction so we have something to reference

    update_history("user", "Where is MoonBase Alpha located?")

    update_history("assistant", "MoonBase Alpha is located in the Tycho crater.")

    print("Current History:")

    for msg in chat_history:

    print(f"{msg['role']}: {msg['content']}")

    Step 3: The Standalone Question Generator

    This is the most important part of today. We need a function that takes the chat_history and the user_question, and asks the LLM to rewrite the question.

    If the user asks "Who runs it?", and history talks about the base, the LLM should rewrite this to "Who runs MoonBase Alpha?"

    def generate_standalone_question(history, user_question):
    

    print("--- Rewriting Question ---")

    # Create a prompt that explains the task to the LLM

    system_prompt = """

    Given the following conversation history and a follow-up question,

    rephrase the follow-up question to be a standalone question.

    The standalone question must contain all necessary context to be understood

    without looking at the history. Replace pronouns (it, he, she, that) with

    the specific names or nouns they refer to.

    Do NOT answer the question. Just rewrite it.

    """

    # Format history for the prompt

    history_text = "\n".join([f"{msg['role']}: {msg['content']}" for msg in history])

    user_prompt = f"""

    Chat History:

    {history_text}

    Follow-up input: {user_question}

    Standalone question:

    """

    response = client.chat.completions.create(

    model="gpt-3.5-turbo",

    messages=[

    {"role": "system", "content": system_prompt},

    {"role": "user", "content": user_prompt}

    ],

    temperature=0.1 # Low temperature for precision

    )

    return response.choices[0].message.content

    Step 4: The Full Conversational RAG Loop

    Now we combine everything.

  • Take user input.
  • Generate standalone question (using history).
  • Retrieve documents (using the standalone question).
  • Generate final answer (using original question + docs + history).
  • def conversational_rag(user_input):
    

    global chat_history

    # 1. Rewrite the question

    standalone_question = generate_standalone_question(chat_history, user_input)

    print(f"Original: '{user_input}'")

    print(f"Rewritten: '{standalone_question}'")

    # 2. Retrieve using the REWRITTEN question

    context = retrieve_documents(standalone_question)

    print(f"Context found: {context}")

    # 3. Generate the final answer # We provide the context and the history to the final generation model

    system_prompt = "You are a helpful assistant for MoonBase Alpha. Use the provided context to answer questions."

    messages = [

    {"role": "system", "content": system_prompt},

    # We can optionally include history here too, but the context is the most important part now

    {"role": "user", "content": f"Context: {context}\n\nQuestion: {user_input}"}

    ]

    response = client.chat.completions.create(

    model="gpt-3.5-turbo",

    messages=messages

    )

    answer = response.choices[0].message.content

    # 4. Update history

    update_history("user", user_input)

    update_history("assistant", answer)

    return answer

    Step 5: Testing the Flow

    Let's run a conversation that would have failed with a standard RAG system.

    # Reset history for a clean start
    

    chat_history = []

    print(">>> Turn 1")

    answer1 = conversational_rag("When was MoonBase Alpha built?")

    print(f"AI: {answer1}\n")

    print(">>> Turn 2")

    # Here is the ambiguous question!

    answer2 = conversational_rag("Who is the commander there?")

    print(f"AI: {answer2}\n")

    print(">>> Turn 3")

    # Another ambiguous question referring to the previous answer

    answer3 = conversational_rag("Does she like potatoes?")

    # Note: Our DB knows about potatoes, but not if Sarah likes them. # The rewrite should clarify "she" is Sarah Jenkins.

    print(f"AI: {answer3}\n")

    Expected Output Analysis:

    * Turn 1: Rewritten might just be the same. Retrieves date.

    * Turn 2: Input "Who is the commander there?" -> Rewritten "Who is the commander of MoonBase Alpha?" -> Retrieves "Sarah Jenkins".

    Turn 3: Input "Does she like potatoes?" -> Rewritten "Does Sarah Jenkins like potatoes?" -> Retrieves info about potatoes (but likely won't find preference info). The key is that it tried* to search for Sarah, not just "she".

    ---

    Now You Try

    You have a working conversational memory. Now, let's make it robust.

    1. The "Clean Start" Button

    Currently, chat_history grows forever. If you change topics completely, the old history might confuse the rewriter.

    * Task: Create a function clear_memory() that empties the list.

    * Extension: Add logic to conversational_rag so that if the user says "Reset" or "New topic", it clears memory automatically.

    2. Source Attribution

    It is important to know why the AI gave an answer.

    * Task: Modify the retrieve_documents function (or the mock data) to include an ID or Title for each fact.

    * Task: Update the final prompt to ask the AI to "Cite the source ID" in its final answer.

    3. History Limiter

    LLMs have a limit on how much text they can process (context window). If your chat history is 100 messages long, it will crash or get expensive.

    * Task: Modify generate_standalone_question to only look at the last 3 turns of conversation (User-AI, User-AI, User-AI).

    * Hint: You can slice a list in Python using chat_history[-6:].

    ---

    Challenge Project: The Standalone Question Generator

    Your challenge is to extract the logic we built today into a robust, reusable class. This component is often the first piece of a professional RAG pipeline.

    Requirements:
  • Create a class QueryRefiner.
  • It should have a method refine(conversation_history, new_query).
  • It must handle three specific edge cases:
  • * Pronouns: "It", "He", "They" -> specific names.

    * Temporal references: "What about last year?" -> "What about [Current Year - 1]?" (You might need to inject the current date into the system prompt!).

    * Implicit context: If the user just types "And the price?", it should rewrite to "What is the price of [Product discussed]?"

  • Return the refined string.
  • Example Input/Output: History:* [User: "Tell me about the iPhone 15.", AI: "It was released in September."] Input:* "Is it waterproof?" Output:* "Is the iPhone 15 waterproof?" History:* [User: "Who is the CEO of Tesla?", AI: "Elon Musk."] Input:* "How old is he?" Output:* "How old is Elon Musk?" Hints:

    * Your system prompt is your most powerful tool here. Be very explicit in your instructions to the LLM.

    * You don't need a database for this challenge—just the LLM logic to rewrite the text.

    ---

    What You Learned

    Today you moved from "Stateless" AI to "Stateful" AI.

    * History-Aware Retrieval: You learned that you cannot search a database with raw chat inputs.

    * Question Condensation: You built a mechanism to rewrite ambiguous questions into standalone queries.

    * Context Injection: You saw how passing previous turns helps the LLM understand "it" and "that."

    Why This Matters:

    In the real world, users never speak in perfect, standalone search queries. They speak in flows. "Show me the red shoes." "Do you have them in size 10?" "What about blue?"

    Without the techniques you learned today, your AI would fail on the second and third sentences. With these techniques, you can build a helpful shopping assistant, a tech support bot, or a legal research tool.

    Tomorrow:

    We are going to look at Advanced RAG Patterns. What happens when you have multiple databases? Or when the question requires math, not just reading? We will explore "Routing" to send questions to the right expert tool.