Day 53 of 80

LlamaIndex Fundamentals

Phase 6: Frameworks & Agents

What You'll Build Today

Welcome to Day 53! Today, we are going to build a "Second Brain" that can read Wikipedia articles and answer questions about them with high accuracy.

Up until now, we have been doing a lot of manual plumbing. We have chopped up text, sent it to embedding models, stored vectors, and retrieved them. Today, we introduce LlamaIndex.

LlamaIndex is a data framework specifically designed to connect custom data (like PDFs, SQL databases, or Wikipedia) to Large Language Models (LLMs). While LangChain is a general-purpose toolkit (like a Swiss Army Knife), LlamaIndex is a specialized power tool for Retrieval Augmented Generation (RAG).

Here is what you will master today:

* LlamaHub Data Connectors: You will learn why writing your own file parsers is a waste of time and how to ingest data from almost anywhere instantly.

* Vector Store Index: You will learn how to turn raw text into a searchable mathematical structure in a single line of code.

* Query Engines: You will learn how to build an interface that doesn't just search for keywords, but synthesizes answers based on context.

* Storage Context: You will learn how to save your expensive embeddings to disk so you don't have to pay to rebuild them every time you run your script.

Let's dive in.

---

The Problem

To understand why LlamaIndex exists, we need to look at what happens when we try to build a RAG system without it.

Imagine you want to build a bot that answers questions about a specific long document. If you were to do this using raw Python and basic API calls, you would face several headaches:

  • Token Limits: You can't just paste a 100-page document into the prompt. It's too expensive and won't fit.
  • Chunking Logic: You have to write code to split the text. Do you split by page? By paragraph? By sentence? If you split it wrong, you lose context.
  • Retrieval Logic: You have to calculate cosine similarity math yourself to find the right chunks.
  • Here is what that "painful" code looks like. You do not need to run this. Just read it and feel the frustration.

    # THE PAIN: Doing RAG manually (Simplified)
    

    import openai

    # 1. We have to manually load and clean the file

    with open("huge_document.txt", "r") as f:

    text = f.read()

    # 2. We have to write our own chunking logic (Fragile!)

    chunk_size = 1000

    chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

    # 3. We have to manually search (Naive keyword search)

    def manual_search(query):

    relevant_chunks = []

    for chunk in chunks:

    # This is bad: it only finds exact word matches, not meanings!

    if query in chunk:

    relevant_chunks.append(chunk)

    return relevant_chunks

    # 4. We have to manually construct the prompt

    query = "What were the financial results?"

    context = manual_search(query)

    prompt = f"Context: {context}\n\nQuestion: {query}"

    # 5. Finally, we call the LLM # response = openai.chat.completions.create(...)
    Why this fails:

    * If the user asks "How much money did we make?" and the text says "Revenue was high," the keyword search fails (no match for "money").

    * We aren't using embeddings.

    * We aren't handling overlap between chunks.

    LlamaIndex solves this by abstracting the "Data -> Embedding -> Index -> Retrieval" pipeline into a high-level interface.

    ---

    Let's Build It

    We are going to build a system that reads a text file (simulating a Wikipedia article) and allows you to chat with it.

    Prerequisites

    You will need to install llama-index. Note that LlamaIndex recently updated their structure, so we install the core package.

    ``bash

    pip install llama-index

    `

    Step 1: Setup and Data Creation

    First, we need some data. Since we want to simulate a Wikipedia knowledge base, let's create a text file about a specific topic. We will use the history of the Python programming language as our example.

    Create a file named python_history.txt in your project folder and paste this text into it:

    File: python_history.txt
    Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC programming language, which was inspired by SETL, capable of exception handling and interfacing with the Amoeba operating system. Its implementation began in December 1989. Van Rossum shouldered sole responsibility for the project, as the lead developer, until July 12, 2018, when he announced his "permanent vacation" from his responsibilities as Python's "Benevolent Dictator For Life", a title the Python community bestowed upon him to reflect his long-term commitment as the project's chief decision-maker. In January 2019, active Python core developers elected a five-member "Steering Council" to lead the project.
    
    

    Python 2.0 was released on October 16, 2000, with many major new features. Python 3.0, released on December 3, 2008, with many of its major features backported to Python 2.6 and 2.7. A major innovation in Python 3.0 was to handle text as Unicode by default.

    Python is a multi-paradigm programming language. Object-oriented programming and structured programming are fully supported, and many of its features support functional programming and aspect-oriented programming (including by metaprogramming and metaobjects). Many other paradigms are supported via extensions, including design by contract and logic programming.

    Now, create your Python script day53_llama.py.

    Step 2: Loading Data with SimpleDirectoryReader

    LlamaIndex uses "Readers" to ingest data. The most common one is SimpleDirectoryReader. It looks at a folder (or specific files) and figures out how to read them, whether they are .txt, .pdf, or .md.

    import os
    

    from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

    # Set your OpenAI API Key

    os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your key

    print("Loading data...")

    # SimpleDirectoryReader is the magic tool. # It reads the file and converts it into 'Document' objects.

    documents = SimpleDirectoryReader(input_files=["python_history.txt"]).load_data()

    print(f"Loaded {len(documents)} document(s).")

    print(f"First document text preview: {documents[0].text[:100]}...")

    Why this matters: You didn't have to open the file, strip newlines, or worry about encoding. The Reader handles the "Ingestion" phase.

    Step 3: Creating the Index

    This is the most powerful line of code in LlamaIndex. We are going to take those documents and turn them into a VectorStoreIndex.

    Behind the scenes, this single line:

  • Splits your text into chunks (Nodes).
  • Sends those chunks to OpenAI to get Embeddings.
  • Stores those embeddings in a local vector database in memory.
  • Add this to your script:

    print("Building index (this handles chunking and embedding)...")
    
    # This builds the index in memory
    

    index = VectorStoreIndex.from_documents(documents)

    print("Index built successfully!")

    Step 4: Building a Query Engine

    Now that we have an index (a database of meanings), we need a way to ask it questions. We create a QueryEngine.

    The Query Engine creates a pipeline that:

  • Takes your question.
  • Embeds it.
  • Finds the most relevant chunks in the index.
  • Sends the chunks + your question to the LLM (GPT-4/3.5) to generate a natural language answer.
  • # Create the query engine
    

    query_engine = index.as_query_engine()

    # Ask a question

    question = "Who created Python and when?"

    print(f"\nAsking: {question}")

    response = query_engine.query(question)

    print("\nResponse:")

    print(response)

    Run your code. You should see a coherent answer derived directly from the text file we created.

    Step 5: Persisting Storage (Saving Money)

    Right now, every time you run this script, you are paying OpenAI to re-embed the text file. If that file was a 500-page book, that would get expensive and slow.

    We need to save the index to your hard drive.

    Update your imports and add this logic to the end of your script:

    from llama_index.core import StorageContext, load_index_from_storage
    
    # Define where to save
    

    PERSIST_DIR = "./storage"

    # Check if we've already saved the index

    if not os.path.exists(PERSIST_DIR):

    print("Creating new index and saving it...")

    # Load documents and create index

    documents = SimpleDirectoryReader(input_files=["python_history.txt"]).load_data()

    index = VectorStoreIndex.from_documents(documents)

    # Save to disk

    index.storage_context.persist(persist_dir=PERSIST_DIR)

    else:

    print("Loading existing index from storage...")

    # Rebuild storage context

    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)

    # Load index

    index = load_index_from_storage(storage_context)

    # Now we can query as usual

    query_engine = index.as_query_engine()

    response = query_engine.query("What are the key features of Python 3.0?")

    print(f"\nAnswer: {response}")

    Run this twice.

    * The first time, it says "Creating new index..."

    * The second time, it says "Loading existing index..." and runs instantly without calling the Embedding API.

    ---

    Now You Try

    You have the basics. Now let's expand this into a proper knowledge base.

    1. Ingest Multiple Files

    Create a second text file named rust_history.txt. Copy a paragraph from Wikipedia about the Rust programming language into it.

    Update your SimpleDirectoryReader to load all files in a directory, not just a specific list.

    Hint: Use SimpleDirectoryReader("path/to/folder").load_data() instead of input_files.

    2. The Interactive Chat Loop

    Currently, the script runs once and quits. Modify the end of your script to run a while True loop that asks the user for input using input("Ask a question: ").

    * If the user types "exit", break the loop.

    * Otherwise, run query_engine.query() and print the result.

    3. Change Response Mode

    By default, LlamaIndex tries to be concise. But what if you want a detailed summary?

    Modify the query engine creation:

    query_engine = index.as_query_engine(response_mode="tree_summarize")
    

    Ask a broad question like "Compare the history of the languages" (assuming you did step 1) and see how the output changes.

    ---

    Challenge Project: Natural Language to SQL

    This is a very popular use case for LlamaIndex in the enterprise world. Instead of querying unstructured text, we want to query a structured SQL database using English.

    The Task:

    Create a script that uses LlamaIndex to query a simple SQLite database.

    Requirements:
  • Create a temporary in-memory SQLite database using Python's sqlite3.
  • Create a table called employees with columns: name, role, salary. Insert 3-4 dummy rows.
  • Use NLSQLTableQueryEngine (Natural Language SQL Table Query Engine) from LlamaIndex.
  • Ask the engine: "Who has the highest salary?"
  • The engine should convert that English into SELECT name FROM employees ORDER BY salary DESC LIMIT 1 automatically and return the answer.
  • Hints:

    * You will need to install sqlalchemy.

    * You will need to import SQLDatabase from llama_index.core.

    * You will need NLSQLTableQueryEngine from llama_index.core.query_engine.

    * The flow is: SQL Engine -> Database -> LLM -> Answer.

    Example Output:
    Question: Who earns the most?
    

    Generated SQL: SELECT name FROM employees ORDER BY salary DESC LIMIT 1;

    Final Answer: Alice earns the most with a salary of 120000.

    ---

    What You Learned

    Today you moved from manual data plumbing to using an industrial-grade RAG framework.

    * Data Loading: You used SimpleDirectoryReader to ingest text without writing parsing code.

    * Indexing: You used VectorStoreIndex to handle chunking and embedding in one line.

    * Persistence: You learned to use StorageContext` to save your index to disk, saving time and API costs.

    * Querying: You used the Query Engine to synthesize natural answers.

    Why This Matters:

    In the real world, data isn't just one text file. It's thousands of PDFs, emails, and database rows. LlamaIndex provides the architecture to manage that scale. You don't have to reinvent the wheel every time you want to search your data.

    Tomorrow:

    Now that our AI can read and remember, we need it to act. Tomorrow, we introduce Agents. Agents are AI systems that don't just answer questions—they can use tools, search the web, and make decisions to complete complex tasks.