Day 36 of 80

RAG Architecture Overview

Phase 5: RAG Systems

Day 36: RAG Architecture Overview

Welcome to Day 36! Today marks the beginning of Phase 5, and quite honestly, this is where the "toy" projects stop and the enterprise-grade engineering begins.

Up until now, we have been relying on the AI's internal memory. We ask it to write a poem or explain Python, and it uses the data it was trained on (which cuts off at a certain date).

But what if you need the AI to answer questions about your private data? Your company's HR documents? Your personal journal? Or news that happened five minutes ago?

If you ask ChatGPT about your specific unreleased product, it will either say "I don't know" or, worse, it will confidently lie to you.

Today, we are going to build the architecture that solves this. It is called RAG (Retrieval Augmented Generation).

What You'll Build Today

We are going to build a "Manual RAG" system. Instead of relying on fancy databases (which we will get to later), we are going to build the engine from scratch using basic Python logic so you understand exactly how the gears turn.

You will build a chatbot that answers questions based on a "private" text file that the AI has never seen before.

Here is what you will learn and why:

* The Hallucination Problem: You will see firsthand how LLMs make things up when they don't have facts.

* RAG Architecture: You will learn the industry-standard pipeline: Retrieve relevant info -> Augment the prompt -> Generate the answer.

* Context Injection: You will learn how to force the AI to look at your data before it speaks.

* RAG vs. Fine-tuning: You will understand why we don't just "retrain" the model every time we have new data.

Let's get started.

The Problem

Let's pretend you work for a fictional company called "Glacier smart-wear." You have a strict return policy: Returns are accepted within 45 days, but only if the item is in the original blue box.

GPT-4o (or any model) does not know this company exists. It was not in its training data.

Let's see what happens when we ask the AI about it.

The Broken Code

Create a file named pain_point.py.

from openai import OpenAI

import os

# Initialize client

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# The user asks a specific question about our private company

user_question = "What is the return policy for Glacier Smart-wear?"

print(f"User Question: {user_question}")

print("-" * 50)

# We ask the LLM directly

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=[

{"role": "system", "content": "You are a helpful customer service assistant."},

{"role": "user", "content": user_question}

]

)

print("AI Answer:")

print(response.choices[0].message.content)

The Frustrating Result

If you run this, the AI will likely do one of two things:

  • The Humble Fail: "I couldn't find information on a company called Glacier Smart-wear."
  • The Hallucination: "Glacier Smart-wear typically offers a 30-day return policy for unworn items..."
  • This is the pain.

    If the AI guesses "30 days," it just lied to your customer. If it says "I don't know," it is useless as a support bot.

    You might be thinking: Can't we just paste our entire 500-page employee handbook into the prompt every time?

    Technically, maybe. But that is slow, expensive, and often exceeds the "Context Window" (the limit on how much text the AI can read at once).

    We need a way to find only the paragraph about returns, paste that into the prompt, and then ask the question. That is RAG.

    Let's Build It

    We are going to fix this by building a manual RAG pipeline.

    RAG stands for:

  • Retrieval: Find the relevant data.
  • Augmentation: Paste that data into the prompt.
  • Generation: Ask the AI to answer using that data.
  • Step 1: Create the "Knowledge Base"

    In the real world, this would be a database. For today, to keep things transparent, we will use a simple Python string representing a text file.

    Create a new file called manual_rag.py.

    from openai import OpenAI
    

    import os

    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    # This represents our private database or text file # The AI does NOT know this information natively

    knowledge_base = """

    Glacier Smart-wear FAQ:

  • Return Policy: Items can be returned within 45 days of purchase. They must be in the original blue box.
  • Shipping: We only ship to the Northern Hemisphere currently. Shipping takes 12-14 business days.
  • Batteries: The heated jacket uses a proprietary Lithium-Ice battery that lasts 8 hours.
  • Washing: Do not machine wash. Spot clean with cold water only.
  • """

    print("Knowledge Base Loaded.")

    Step 2: The Retrieval System (Search)

    We need a function that takes the user's question and looks through our knowledge base to find relevant information.

    Since we aren't using complex databases yet, we will use a simple "Keyword Search." We will split our knowledge base into lines and see if any words from the question match the lines.

    Add this function to your code:

    def retrieve_info(query, data):
    

    """

    1. Split the data into lines.

    2. Check if keywords from the query appear in the line.

    3. Return relevant lines.

    """

    results = []

    lines = data.split('\n')

    # Simple keyword extraction (very basic for demonstration) # We look for words in the query that are longer than 3 letters

    keywords = [word.lower() for word in query.split() if len(word) > 3]

    print(f"Searching for keywords: {keywords}")

    for line in lines:

    # Check if any keyword is in this line (case insensitive)

    for key in keywords:

    if key in line.lower():

    results.append(line)

    break # If we find a match in this line, stop checking other keywords for this line

    return "\n".join(results)

    # Test the retrieval

    question = "How do I wash the jacket?"

    retrieved_context = retrieve_info(question, knowledge_base)

    print(f"\nUser asks: {question}")

    print(f"System Found: {retrieved_context}")

    Run this code.

    You should see that when you ask about washing, the system finds the line: "4. Washing: Do not machine wash. Spot clean with cold water only."

    This is the Retrieval step. We found the needle in the haystack.

    Step 3: The Augmentation (Prompt Engineering)

    Now that we have the specific fact, we need to combine it with the user's question. We don't just send the question to the AI; we send an "Augmented Prompt."

    The structure looks like this:

    Here is some context: [Insert Retrieved Info Here]
    
    

    Based on the context above, answer this question: [Insert User Question Here]

    Let's write the function to build this prompt.

    def augment_prompt(query, context):
    

    return f"""

    You are a helpful support assistant for Glacier Smart-wear.

    Answer the user's question using ONLY the context provided below.

    If the answer is not in the context, say "I don't know."

    Context:

    {context}

    Question:

    {query}

    """

    Step 4: The Generation (The Final Answer)

    Finally, we send this augmented prompt to the LLM. The LLM acts as the reasoning engine. It reads the context we found and formulates a polite answer.

    Add the final piece:

    def ask_glacier_bot(query):
    

    print(f"\n--- Processing: '{query}' ---")

    # 1. RETRIEVE

    context = retrieve_info(query, knowledge_base)

    if not context:

    print("No relevant info found in knowledge base.")

    return "I'm sorry, I don't have information about that in my records."

    # 2. AUGMENT

    prompt = augment_prompt(query, context)

    # 3. GENERATE

    response = client.chat.completions.create(

    model="gpt-4o-mini",

    messages=[

    {"role": "user", "content": prompt}

    ]

    )

    return response.choices[0].message.content

    # Let's test the full pipeline!

    answer1 = ask_glacier_bot("What is the return policy?")

    print(f"Bot Answer: {answer1}")

    answer2 = ask_glacier_bot("How long does the battery last?")

    print(f"Bot Answer: {answer2}")

    answer3 = ask_glacier_bot("Do you sell shoes?") # This is not in the text

    print(f"Bot Answer: {answer3}")

    Run the Code

    When you run this, you will see the magic happen.

  • Return Policy: The bot will correctly cite the 45-day rule (which GPT-4o does not know natively).
  • Battery: It will find the battery line and answer correctly.
  • Shoes: It will look for "shoes", find nothing, and the retrieve_info function will return nothing. The bot will safely say it doesn't know, rather than hallucinating.
  • Now You Try

    You have built a working RAG pipeline! It is manual and simple, but the architecture is identical to what billion-dollar companies use.

    Try these extensions to solidify your understanding:

  • Expand the Knowledge Base: Add 3 new facts to the knowledge_base string (e.g., about warranty, CEO name, office location). Rerun the code asking about these new facts. Notice you didn't have to change any code, just the data.
  • Improve the "No Results" Handler: Currently, if retrieve_info returns an empty string, we return a hardcoded error message. Change the logic so that if no context is found, we still send the prompt to the AI, but with a special instruction: "I could not find any internal documents about this. Answer based on your general knowledge, but warn the user that this might not be accurate for Glacier Smart-wear."
  • Debug Mode: Modify the ask_glacier_bot function to print the full prompt variable before sending it to OpenAI. This allows you to see exactly what the "Augmented" step looks like.
  • Challenge Project: The Analogy

    One of the hardest parts of working with AI is explaining it to non-technical stakeholders (bosses, clients).

    Your Goal: Write a Python script that uses the LLM to generate an analogy explaining RAG. Requirements:

    * You must use the client.chat.completions.create method.

    * The system prompt should be: "You are an expert teacher who explains complex tech using simple analogies."

    * Ask the AI to explain Retrieval Augmented Generation using the analogy of a student taking a test.

    Hint:* The student is the Model. Hint:* The textbook is the Knowledge Base. Hint:* An "Open Book Test" is RAG. A "Closed Book Test" is standard generation.

    * Print the explanation.

    What You Learned

    Today you tackled the most important architecture in modern AI development.

    * RAG (Retrieval Augmented Generation): The process of giving the AI reference material before asking it to answer.

    * The Pipeline:

    1. Retrieve: Search your data (today we used keywords).

    2. Augment: Stuff that data into the prompt.

    3. Generate: Let the AI synthesize the answer.

    Why this matters: This is how you build AI that knows about your life, your company, and your* data without having to pay millions to train a new model. Tomorrow: We ditch the hardcoded string. You will learn Document Loading—how to pull text from PDFs, Word docs, and websites to build a real knowledge base.