Day 70 of 80

Project: Private Local Assistant

Phase 7: Advanced Techniques

Here is the comprehensive content for Day 70.

What You'll Build Today

Welcome to Day 70! You have made it far. Today marks the culmination of Phase 7, where we focus on advanced techniques. Up until now, almost every impressive AI application we have built has relied on a connection to a massive brain in the cloud (like OpenAI or Anthropic).

Today, we cut the cord.

You are going to build a Private Local Assistant. This is a fully functional RAG (Retrieval Augmented Generation) system that runs entirely on your laptop. It will read your private documents, store them in a local database, answer questions using a local Large Language Model (LLM), and even speak the answers back to you.

Here is what you will master today:

* Local Inference: Running an LLM (like Llama 3 or Mistral) on your own hardware using Ollama. Why? Because sometimes you cannot send sensitive data to the cloud, or you simply don't want to pay per token.

* Local Embeddings: Turning text into numbers using a model that lives on your hard drive. Why? To ensure your search system works even without Wi-Fi.

* Persistent Vector Storage: Using ChromaDB to save your data so you don't have to rebuild the database every time you restart the script.

* Offline Text-to-Speech: Giving your assistant a voice without relying on API calls.

By the end of this lesson, you will be able to disconnect your internet, turn on Airplane Mode, and still chat with your AI assistant about your private documents.

The Problem

Let's talk about the frustration of cloud-dependent AI.

Imagine you have built a fantastic tool that summarizes legal contracts or analyzes medical records. You are ready to show it to a client or your boss. You open your laptop, run the script, and... nothing happens.

Why? The Wi-Fi in the conference room is spotty. Or perhaps your credit card on the OpenAI account expired. Or, worst of all, the client asks, "Wait, are you sending our confidential files to a third-party server?"

Here is a typical piece of code that represents this vulnerability. This is what we are trying to avoid today:

import openai
import os

# The "Pain" Scenario
def chat_with_cloud(prompt):
    try:
        # This requires:
        # 1. An active internet connection
        # 2. A valid credit card on file
        # 3. Trusting the cloud provider with your data
        client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
        
        response = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
        
    except openai.APIConnectionError:
        return "Error: No internet connection. The AI is dead."
    except openai.AuthenticationError:
        return "Error: API Key invalid or expired."
    except Exception as e:
        return f"Error: {e}"

# If the internet cuts out, your application is useless.
print(chat_with_cloud("Hello? Are you there?"))

If you run this without internet, your application crashes or returns an error. It feels fragile. It feels like you are renting intelligence rather than owning it.

How do we build something that we actually own? How do we build a system that works in a bunker?

Let's Build It

We are going to replace every cloud component with a local equivalent.

Prerequisites:

Download Ollama: You must have the Ollama application installed and running on your computer. Go to ollama.com to download it.

Pull a Model: Open your terminal/command prompt and run ollama pull llama3 (or mistral if your computer is older).

Install Python Libraries:

``bash


    pip install ollama chromadb sentence-transformers pyttsx3



Step 1: The Local Brain (Ollama)

First, let's verify we can talk to our local LLM using Python. We will use the ollama library, which communicates with the Ollama application running in the background.



import ollama

def test_local_brain():
    print("Contacting local Llama 3...")
    
    # This runs entirely on your CPU/GPU
    response = ollama.chat(model='llama3', messages=[
        {
            'role': 'user',
            'content': 'Explain what a "local LLM" is in one sentence.',
        },
    ])
    
    print("\nAI Response:")
    print(response['message']['content'])

if __name__ == "__main__":
    test_local_brain()


Why this matters: When you run this, notice there is no API key involved. You are talking directly to the model files on your hard drive.

Step 2: The Local Embeddings

To build a RAG system (chatting with documents), we need to turn text into numbers (vectors). Usually, we use OpenAI's embedding model. Today, we will use sentence-transformers, a library that downloads a small, efficient model to your computer.



from sentence_transformers import SentenceTransformer

# Load a small, fast model designed for semantic search
# It will download once (about 90MB) and then run offline
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

def get_local_embedding(text):
    # Converts text to a list of numbers (vector)
    vector = embed_model.encode(text)
    return vector.tolist() # Convert numpy array to standard list

if __name__ == "__main__":
    text = "Privacy is important."
    vector = get_local_embedding(text)
    print(f"Text converted to vector of length: {len(vector)}")
    print(f"First 5 numbers: {vector[:5]}")


Why this matters: We can now "mathematically understand" text without sending that text to a third party.

Step 3: The Local Vector Database

Now we need a place to store these vectors. We will use ChromaDB. We will configure it to save data to a folder called local_memory so it persists even if we close the script.



import chromadb
from sentence_transformers import SentenceTransformer
import os

# Initialize components
embed_model = SentenceTransformer('all-MiniLM-L6-v2')

# Setup persistent storage
# This creates a folder named 'local_memory' in your project directory
chroma_client = chromadb.PersistentClient(path="./local_memory")

# Create or get a collection (like a table in SQL)
collection = chroma_client.get_or_create_collection(name="private_docs")

def add_document(doc_id, text):
    print(f"Adding document: {doc_id}")
    
    # 1. Embed locally
    embedding = embed_model.encode(text).tolist()
    
    # 2. Store in Chroma
    collection.add(
        documents=[text],
        embeddings=[embedding],
        ids=[doc_id]
    )

if __name__ == "__main__":
    # Let's add some "secret" knowledge the AI wouldn't know otherwise
    secret_text = "Project BlueBook is actually a recipe for the world's best blueberry pie."
    add_document("secret_001", secret_text)
    print("Document stored safely on disk.")


Why this matters: You have just created a database that lives on your file system. You can restart your computer, and "Project BlueBook" will still be there.

Step 4: The Full RAG Pipeline

Now we combine everything. We will search the local database for relevant info and pass it to Ollama to generate an answer.

import ollama
import chromadb
from sentence_transformers import SentenceTransformer

# --- SETUP ---
print("Initializing Local Stack...")
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
chroma_client = chromadb.PersistentClient(path="./local_memory")
collection = chroma_client.get_or_create_collection(name="private_docs")

def ask_local_assistant(question):
    print(f"\nThinking about: '{question}'...")
    
    # 1. Embed the question
    query_vec = embed_model.encode(question).tolist()
    
    # 2. Search local DB
    results = collection.query(
        query_embeddings=[query_vec],
        n_results=1 # Get the top 1 most relevant chunk
    )
    
    # Handle case where DB is empty or no results
    if not results['documents'][0]:
        context = "No relevant context found."
    else:
        context = results['documents'][0][0]
        
    print(f"Found context: {context}")
    
    # 3. Construct Prompt
    prompt = f"""
    You are a helpful private assistant. Use the following context to answer the user's question.
    
    Context: {context}
    
    Question: {question}
    """
    
    # 4. Generate Answer with Ollama
    response = ollama.chat(model='llama3', messages=[
        {'role': 'user', 'content': prompt}
    ])
    
    return response['message']['content']

if __name__ == "__main__":
    # Ensure you ran Step 3 first to populate the DB!
    answer = ask_local_assistant("What is Project BlueBook?")
    print(f"\nASSISTANT: {answer}")


Step 5: Adding a Voice (Offline TTS)

Finally, let's make it speak. We use pyttsx3, a text-to-speech library that uses your operating system's built-in voice drivers. It is robotic compared to ElevenLabs, but it is free, fast, and offline.



import pyttsx3

def speak_text(text):
    engine = pyttsx3.init()
    
    # Optional: Adjust rate (speed) and volume
    engine.setProperty('rate', 175) 
    engine.setProperty('volume', 1.0)
    
    engine.say(text)
    engine.runAndWait()

if __name__ == "__main__":
    speak_text("System online. Ready for offline operations.")


Output: You should hear your computer speak the phrase.

Now You Try

You have the building blocks. Now, combine them into a robust application.

 The Personality Shift:

Modify the prompt in Step 4. Give your assistant a specific persona (e.g., "You are a grumpy librarian" or "You are a secret agent"). Change the pyttsx3 voice property to match (look up engine.getProperty('voices') to see available voices on your OS).



 The Document Loader:

Right now, we are hardcoding the "secret text." Write a function that reads a .txt file from your computer, splits it into chunks (you can split by paragraphs for simplicity), and loads all chunks into ChromaDB.



 The Continuous Loop:

Wrap the interaction in a while True: loop so you can keep asking questions without restarting the script. Add a command like "exit" or "quit" to break the loop.



Challenge Project: The "Airplane Mode" Test

Your challenge today is to verify the robustness of your system.

The Scenario: You are a researcher in a remote location with no internet access. You have a manual for a piece of equipment (create a text file named

manual.txt

 with some made-up technical instructions). You need to query this manual using your AI.

Requirements:

Create a script offline_bot.py.

It must load manual.txt into the vector store if it hasn't already.


 It must enter a chat loop where the user types a question.
 It must retrieve the answer and speak it out loud.
 The Critical Test: You must disconnect your computer from the internet (turn off Wi-Fi/Ethernet) and run the script.

Example Input/Output:
   Input:* "How do I reset the flux capacitor?"
   Process:* (System searches local DB, finds the reset procedure, sends to Ollama)
   Audio Output:* "To reset the flux capacitor, hold the red button for five seconds."

Hints:

* If pyttsx3 throws an error on Mac, ensure you have the correct audio drivers or try a simple os.system('say "text"')` as a fallback.

* Remember to check if the collection already has data so you don't add the same file 50 times.

What You Learned

Today you broke free from the cloud. You learned:

* Data Sovereignty: How to keep your data on your own machine using Ollama.

* Local Vectorization: How to use Sentence Transformers to understand text offline.

* Persistence: How ChromaDB saves your vector data to disk.

* Offline UX: How to add voice interaction without APIs.

Why This Matters:

In the enterprise world, "On-Premise" (running software on the company's own servers) is a massive requirement. Banks, hospitals, and defense contractors often cannot use standard ChatGPT. You now know the architecture to build secure, private, enterprise-grade AI assistants.

Phase 7 Complete!

You have mastered advanced techniques including memory, agents, and local stacks. Tomorrow, we begin Phase 8: Deployment. It is time to take these scripts running on your laptop and turn them into web applications that anyone in the world can use. See you on Day 71!

← Day 69 Day 71 →