Day 50 of 80

Project: Customer Support Bot (Day 2)

Phase 5: RAG Systems

What You'll Build Today

Welcome to Day 50! We have reached a massive milestone. Today, we finish our "from scratch" RAG (Retrieval-Augmented Generation) system.

Yesterday, you built a bot that could read a document and answer questions. That was the "brain." Today, we are going to build the "conscience."

We are going to take your basic support bot and turn it into a professional-grade system. A bot that blindly answers every question is actually dangerous in a business setting. If a user asks a banking bot, "How do I make a lasagna?", and the bot tries to answer using banking documents, it will hallucinate nonsense.

Today, you will build a Smart Customer Support Bot that includes:

* Confidence Scoring: The bot will mathematically calculate how relevant its knowledge is to the user's question. If the relevance is low, it won't guess.

* Graceful Failure: Instead of lying, the bot will learn to say, "I'm sorry, I don't have that information."

* Source Citations: The bot will prove its answers by citing exactly which document chunk it used.

* Human Handoff: If the bot gets stuck or the user is frustrated, it will trigger a "Connecting to agent..." workflow.

This is the difference between a toy project and a product people trust. Let's get to work.

The Problem

Let's look at why a basic RAG system (like the one we started yesterday) isn't enough for the real world.

Imagine you have a knowledge base about a software product called "CloudStream." You ask the bot a question that isn't in the manual, like "How do I bake a cake?"

In a naive system, the retrieval step always returns the "closest" match, even if that match is terrible. It might return a document about "Baking data to the cloud." The LLM, forced to answer based on that context, will hallucinate.

Here is the "Painful Code" scenario. This represents the naive approach:

# THE PAIN: A naive bot that answers everything, even if it knows nothing.

def naive_bot_response(user_query, retrieved_context):

# We force the bot to answer using the context, # but we don't check if the context is actually relevant.

prompt = f"""

You are a helpful assistant.

Answer the question using ONLY the context below.

Context: {retrieved_context}

Question: {user_query}

"""

# Imagine this calls GPT-4

return "Sure! To bake a cake, you should 'bake' your data into the cloud repository..."

# Scenario

query = "How do I bake a chocolate cake?"

# The search engine found the only word "bake" in the manual

bad_context = "To save storage, we bake the data into compressed archives."

response = naive_bot_response(query, bad_context)

print(f"User: {query}")

print(f"Bot: {response}")

The Result: User: How do I bake a chocolate cake? Bot: Sure! To bake a cake, you should 'bake' your data into the cloud repository... Why this hurts:
  • Loss of Trust: The user immediately knows the bot is stupid.
  • Liability: If this was a medical or legal bot, a hallucinated answer could be disastrous.
  • Frustration: The user wants to talk to a human, but the bot keeps trying to answer questions it doesn't understand.
  • There has to be a way to measure how good our retrieved context is before we let the LLM speak.

    Let's Build It

    We are going to build this step-by-step. We will simulate the "Vector Database" and "Embeddings" using simple Python lists and math so you can run this code immediately without needing an API key for this specific exercise (though in a real app, you'd use OpenAI or similar).

    Step 1: The Knowledge Base and Simulated Embeddings

    First, we need our "brains." We will create a small knowledge base about a fictional return policy. We will also create a helper function to simulate "similarity."

    In the real world, you use an Embedding Model to turn text into numbers. Here, to keep the code runnable without dependencies, we will use a simple keyword overlap score to simulate "Vector Similarity."

    import numpy as np
    
    # 1. Our Knowledge Base (The "Chunks")
    

    knowledge_base = [

    {"id": 1, "text": "Returns are accepted within 30 days of purchase."},

    {"id": 2, "text": "Items must be in original packaging to be returned."},

    {"id": 3, "text": "Refunds are processed to the original payment method within 5 business days."},

    {"id": 4, "text": "Technical support is available 24/7 via email at support@cloudstream.com."}

    ]

    # 2. A simulated 'Similarity Search' function # In real life, this uses Cosine Similarity on Vector Embeddings. # Here, we count word overlaps to simulate a 'score' between 0 and 1.

    def get_similarity_score(query, document_text):

    query_words = set(query.lower().split())

    doc_words = set(document_text.lower().split())

    # Calculate overlap

    intersection = query_words.intersection(doc_words)

    # A simple score: percentage of query words found in the document

    if len(query_words) == 0: return 0.0

    return len(intersection) / len(query_words)

    # Test it

    q = "return packaging"

    doc = knowledge_base[1]['text']

    score = get_similarity_score(q, doc)

    print(f"Query: {q}")

    print(f"Doc: {doc}")

    print(f"Relevance Score: {score}")

    # Output should be high (0.5 or 1.0 depending on words)
    Why this matters: We need a mathematical way to say "This document is 80% relevant" vs "This document is 10% relevant."

    Step 2: Retrieval with a Confidence Threshold

    Now we implement the "Guardrail." We will search for the best document, but if the best score is too low, we return None. This is the most critical step in preventing hallucinations.

    def retrieve_best_context(query, threshold=0.4):
    

    best_score = -1

    best_doc = None

    print(f"\nSearching for: '{query}'")

    for doc in knowledge_base:

    score = get_similarity_score(query, doc['text'])

    print(f" - Checked ID {doc['id']} (Score: {score:.2f})")

    if score > best_score:

    best_score = score

    best_doc = doc

    # THE GUARDRAIL:

    if best_score < threshold:

    print(f" >> REJECTED: Best score {best_score:.2f} is below threshold {threshold}")

    return None, 0

    print(f" >> ACCEPTED: Found context ID {best_doc['id']}")

    return best_doc, best_score

    # Try a good query

    context, score = retrieve_best_context("return packaging")

    # Try a bad query

    context, score = retrieve_best_context("how to cook pizza")

    Why this matters: We effectively taught the bot to realize when it doesn't find a good match.

    Step 3: The Prompt with Source Citations

    Now we construct the prompt. We want the bot to tell us where it got the info. We will inject the Document ID into the context.

    Note: Since we are not connecting to a live LLM API in this specific snippet, we will simulate the LLM's response logic.
    def generate_response(query, context_doc):
        # If the retrieval step returned Nothing (due to low score):
    

    if context_doc is None:

    return {

    "answer": "I am sorry, I don't have information about that in my knowledge base.",

    "source": None,

    "handoff": False # No need to escalate yet, just a polite decline

    }

    # If we have context, we construct the prompt (Simulated here) # In a real app, you send this string to OpenAI

    prompt = f"""

    Context: {context_doc['text']} (Source ID: {context_doc['id']})

    Question: {query}

    Instructions: Answer the question and cite the Source ID.

    """

    # SIMULATING the LLM response for this tutorial # This logic mimics what GPT-4 would do given the prompt

    response_text = f"Based on the policy, {context_doc['text']} [Source: {context_doc['id']}]"

    return {

    "answer": response_text,

    "source": context_doc['id'],

    "handoff": False

    }

    print("--- Testing Generation ---")

    result = generate_response("return packaging", {"id": 2, "text": "Items must be in original packaging."})

    print(result['answer'])

    Step 4: Implementing Human Handoff

    Sometimes, even with context, the user isn't happy, or the system detects high uncertainty. We need a "Handoff" trigger.

    We will create a wrapper that manages the conversation flow.

    class SupportBot:
    

    def __init__(self):

    self.handoff_keywords = ["human", "agent", "person", "frustrated", "manager"]

    def handle_query(self, query):

    # 1. Check for Immediate Handoff Triggers (Keyword based)

    for word in self.handoff_keywords:

    if word in query.lower():

    return "I am connecting you to a human agent now. Please hold..."

    # 2. Retrieve Context

    context_doc, score = retrieve_best_context(query)

    # 3. Generate Response

    response_data = generate_response(query, context_doc)

    # 4. Check for "I don't know" loop # If the bot repeatedly says "I don't know", we should escalate.

    if response_data['source'] is None:

    return "I apologize, but I cannot find that information. Would you like to speak to a human?"

    return response_data['answer']

    # Let's run the full bot!

    bot = SupportBot()

    print("\n--- TEST 1: Good Query ---")

    print("Bot:", bot.handle_query("return packaging"))

    print("\n--- TEST 2: Irrelevant Query ---")

    print("Bot:", bot.handle_query("cook pizza"))

    print("\n--- TEST 3: Frustrated User ---")

    print("Bot:", bot.handle_query("I want to speak to a person"))

    Step 5: The Output

    When you run the code above, you should see:

  • Test 1: The bot finds the packaging document, scores it high, and answers with a citation.
  • Test 2: The bot searches, finds low scores (0.0), rejects the context, and politely admits ignorance.
  • Test 3: The bot detects the word "person" and immediately triggers the handoff script.
  • This is a robust architecture. It prefers silence over lying, and it offers help when it fails.

    Now You Try

    You have the core logic. Now, extend it with these three tasks:

  • The "Three Strikes" Rule:
  • Modify the SupportBot class to track how many times in a row the bot returned "I don't know." If it happens 3 times in a single session, automatically trigger the "Connecting to agent..." message.

  • Multi-Source Synthesis:
  • Update the retrieve_best_context function to return the top 2 documents instead of just one. Modify the prompt (in generate_response) to include both pieces of text. This helps if the answer is split across two documents (e.g., one doc mentions "Refunds" and another mentions "Timeframe").

  • Strict Mode vs. Loose Mode:
  • Add a parameter to your bot strict_mode=True.

    * If True, set the threshold to 0.6 (high confidence required).

    * If False, set the threshold to 0.2 (bot guesses more often).

    Test how this changes the answers for vague queries.

    Challenge Project: Feedback & Knowledge Gaps

    In a real company, the most valuable data isn't what the bot knows—it's what the bot doesn't know.

    The Goal: Build a feedback loop that tracks "Knowledge Gaps." Requirements:
  • Create a function collect_feedback(query, success_bool).
  • If the bot returns "I don't know" (low score), automatically call this function with success_bool=False.
  • Store these "failed queries" in a list or simple JSON file.
  • Create a function generate_gap_report() that prints out: "Users asked these questions, but we had no answers: [list of questions]."
  • Example Output:
    User: "How do I reset my password?"
    

    Bot: "I apologize, I cannot find that information."

    [System]: Logged failure for 'reset password'

    ... later ...

    > generate_gap_report()

    WARNING: KNOWLEDGE GAPS DETECTED

    The bot failed to answer these common queries:

  • "How do I reset my password?"
  • "Is there a mobile app?"
  • Action item: Add documents covering these topics to the Knowledge Base.

    Hints:

    * You don't need a database; a global list failed_queries = [] works for this session.

    * This is how product managers decide what documentation to write next!

    What You Learned

    Today, you moved from "making it work" to "making it safe."

    * Confidence Scoring: You learned that not all retrieval is good retrieval. You must filter by score.

    * Thresholding: You implemented logic to say "If relevance < 0.4, say I don't know."

    * Citations: You learned to make the LLM accountable by referencing source IDs.

    * Handoffs: You built an escape hatch for frustrated users.

    Why This Matters:

    In the enterprise world, "Safety" and "Accuracy" are more important than "Creativity." A bot that helps 80% of users and politely declines the other 20% is a success. A bot that helps 80% and lies to the other 20% is a lawsuit waiting to happen.

    Phase 5 Complete!

    You have now mastered the architecture of RAG systems. You know how to chop data, embed it, store it, retrieve it, and generate answers from it.

    Tomorrow: We start Phase 6. We stop writing all this boilerplate code from scratch. We will introduce LangChain and LlamaIndex, the industry-standard frameworks that automate everything you just built. See you then!