Day 40 of 80

Vector Databases: Pinecone

Phase 5: RAG Systems

What You'll Build Today

Welcome to Day 40! You have come a long way. Yesterday, you built a local search engine using ChromaDB. It was fast, free, and ran right on your computer. But today, we are going to upgrade your infrastructure significantly.

Today, you are going to build a cloud-hosted vector search engine using Pinecone. You will take a set of documents, convert them into "embeddings" (lists of numbers), upload them to the cloud, and perform semantic searches that can be accessed from anywhere in the world.

Here is what you will learn and why it matters:

* Cloud Vector Databases: You will move from saving files on your laptop to using a managed cloud service. This is necessary because if you want to build an app that other people use, your laptop cannot be the server.

* Pinecone Indexes: You will learn how to organize vectors in the cloud. This is the foundation of storing millions of AI memories.

* Upserting: You will learn the specific operation of "Update or Insert." This is critical for keeping your data in sync without creating duplicates.

* Namespaces: You will learn how to partition data within a single database. This is essential for "Multi-tenancy"—ensuring User A cannot search through User B's private documents.

The Problem

Let's talk about why we can't just stick with ChromaDB or saving vectors to a CSV file on your computer.

Imagine you have built an amazing Chatbot for a law firm. It searches through thousands of legal PDFs to answer questions. It works perfectly on your laptop.

You decide to deploy this app to a web server so the lawyers can actually use it. You upload your code, but you forget to upload the folder containing the database. The app crashes immediately.

Or, consider this code snippet using a local approach:

# This is the "Painful" Local Approach

import json

# Imagine this list has 100,000 items

database = [

{"id": 1, "text": "Contract A details...", "vector": [0.1, 0.2, ...]},

{"id": 2, "text": "Contract B details...", "vector": [0.3, 0.4, ...]}

]

# Every time you restart your app, you have to load this huge file into memory

with open("my_huge_database.json", "r") as f:

data = json.load(f)

# If your computer runs out of RAM, your app crashes. # If you want to share this data with a coworker, you have to email a 5GB file. # If two users try to update the file at the same time, the file gets corrupted.

The pain here is Scalability and Accessibility.

  • RAM Limits: Your computer has limited memory. If you have 10 million vectors, you cannot load them all into a Python list or a local SQLite file without things slowing to a crawl.
  • Statelessness: Modern web apps are "stateless." They spin up and down constantly. They cannot rely on a file sitting on a hard drive.
  • Isolation: If you have multiple users, managing permissions in a single JSON file is a security nightmare.
  • We need a database that lives in the cloud, handles the memory management for us, and lets us separate user data securely. That is Pinecone.

    Let's Build It

    We are going to set up Pinecone, create an index, and perform a semantic search.

    Prerequisites:

    You need a Pinecone API Key.

  • Go to pinecone.io and sign up (the free tier is generous).
  • Create an API Key.
  • You also need your OpenAI API key.
  • Install the necessary libraries:

    pip install pinecone-client openai

    Step 1: Initialization and Connection

    First, we need to securely connect to the Pinecone cloud service.

    import os
    

    from pinecone import Pinecone, ServerlessSpec

    from openai import OpenAI

    # 1. Setup API Keys # Ideally, use os.getenv("PINECONE_API_KEY") but for learning, you can paste it here

    os.environ["PINECONE_API_KEY"] = "YOUR_PINECONE_KEY_HERE"

    os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_KEY_HERE"

    # 2. Initialize Clients

    pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

    print("Successfully connected to Pinecone and OpenAI!")

    Step 2: Creating the Index

    Think of an Index as a specific database instance. In Pinecone, we have to define the "dimension" of our vectors. Since we are using OpenAI's text-embedding-3-small, the dimension is always 1536.

    We also use ServerlessSpec. This tells Pinecone: "I don't want to rent a specific computer. just charge me for what I use."

    index_name = "day-40-demo"
    
    # Check if index already exists (to avoid errors if you run this twice)
    

    existing_indexes = pc.list_indexes().names()

    if index_name not in existing_indexes:

    print(f"Creating index: {index_name}...")

    pc.create_index(

    name=index_name,

    dimension=1536, # Must match OpenAI embedding size

    metric="cosine", # The math used to calculate similarity

    spec=ServerlessSpec(

    cloud="aws",

    region="us-east-1"

    )

    )

    print("Index created!")

    else:

    print(f"Index {index_name} already exists.")

    # Connect to the index

    index = pc.Index(index_name)

    print(f"Connected to index: {index_name}")

    Step 3: Preparing Data and Embedding

    We cannot search text directly. We must convert text to numbers (vectors). We will create a small dataset of "facts."

    # Our raw data
    

    documents = [

    {"id": "vec1", "text": "Apple released the first iPhone in 2007."},

    {"id": "vec2", "text": "SpaceX was founded by Elon Musk to reduce space transportation costs."},

    {"id": "vec3", "text": "Python is a high-level programming language known for readability."},

    {"id": "vec4", "text": "The Great Barrier Reef is the world's largest coral reef system."},

    {"id": "vec5", "text": "Pinecone is a vector database used for machine learning applications."}

    ]

    def get_embedding(text):

    response = client.embeddings.create(

    input=text,

    model="text-embedding-3-small"

    )

    return response.data[0].embedding

    # Prepare data for upload

    vectors_to_upload = []

    print("Generating embeddings (this takes a moment)...")

    for doc in documents:

    # 1. Turn text into numbers

    vector_values = get_embedding(doc["text"])

    # 2. Create the payload format Pinecone expects: # (ID, Vector_List, Metadata_Dictionary)

    pinecone_record = (

    doc["id"], # The unique ID

    vector_values, # The list of 1536 numbers

    {"text": doc["text"]} # Metadata (so we can read the text later)

    )

    vectors_to_upload.append(pinecone_record)

    print(f"Prepared {len(vectors_to_upload)} vectors for upload.")

    Step 4: Upserting (Upload + Insert)

    Now we push the data to the cloud. We use the term Upsert because if a vector with id="vec1" already exists, it updates it. If it doesn't, it inserts it.

    # Upload to Pinecone
    # We usually upload in batches, but for 5 items, sending all at once is fine.
    

    print("Upserting vectors to cloud...")

    index.upsert(vectors=vectors_to_upload)

    print("Upload complete! Your data is now in the cloud.")

    Step 5: The Semantic Search

    Now for the magic. We will ask a question, convert that question to numbers, and ask Pinecone: "Which of the vectors in your cloud storage are closest to this question's numbers?"

    query_text = "Tell me about tech companies."
    
    # 1. Embed the query
    

    query_vector = get_embedding(query_text)

    # 2. Search Pinecone

    search_results = index.query(

    vector=query_vector,

    top_k=2, # Return the top 2 matches

    include_metadata=True # VERY IMPORTANT: Otherwise you just get IDs back, not the text!

    )

    # 3. Display Results

    print(f"\nQuery: '{query_text}'")

    print("-" * 30)

    for match in search_results['matches']:

    score = match['score'] # How close is the match? (0 to 1)

    text_content = match['metadata']['text']

    print(f"Score: {score:.4f} | Text: {text_content}")

    Step 6: Using Namespaces (Multi-tenancy)

    Imagine you have two users: Alice and Bob. You don't want Bob searching Alice's notes. You could create a new Index for every user, but that is expensive and slow.

    Instead, we use Namespaces. It's like putting data into different folders within the same cabinet.

    # Let's add data specifically for 'user_alice'
    

    alice_docs = [

    {"id": "alice_1", "text": "Alice's secret recipe is spicy tacos."}

    ]

    alice_vectors = []

    for doc in alice_docs:

    vec = get_embedding(doc["text"])

    alice_vectors.append((doc["id"], vec, {"text": doc["text"]}))

    # Upsert into a specific namespace

    index.upsert(vectors=alice_vectors, namespace="user_alice")

    print("Uploaded Alice's data to namespace 'user_alice'")

    # Now, let's search specifically in Alice's namespace

    query = "What is the secret recipe?"

    q_vec = get_embedding(query)

    # Search ONLY Alice's data

    alice_results = index.query(

    vector=q_vec,

    top_k=1,

    include_metadata=True,

    namespace="user_alice" # <--- This is the key!

    )

    print("\nSearching Alice's Namespace:")

    if alice_results['matches']:

    print(alice_results['matches'][0]['metadata']['text'])

    else:

    print("No matches found.")

    # Search the default namespace (where we put the tech facts earlier) for the same query

    default_results = index.query(

    vector=q_vec,

    top_k=1,

    include_metadata=True

    # No namespace specified defaults to empty namespace

    )

    print("\nSearching Default Namespace:")

    if default_results['matches']:

    print(default_results['matches'][0]['metadata']['text'])

    else:

    print("No matches found (as expected, Alice's recipe isn't here).")

    Now You Try

    You have the basics. Now extend the functionality.

  • Metadata Filtering: In Step 3, add a category field to your metadata (e.g., {"text": "...", "category": "space"}). Then, when querying, look up how to use the filter parameter in index.query to only search for documents where category is "space".
  • The Deleter: Write a script that asks the user for an ID (e.g., "vec1") and deletes that vector from Pinecone using index.delete(ids=["vec1"]). Verify it's gone by trying to query for it.
  • Batch Processing: Create a loop that generates 20 dummy sentences. Instead of upserting them one by one, group them into a list and upsert them in a single function call. This is how production systems handle speed.
  • Challenge Project: Multi-Tenant Diary

    Build a secure "Cloud Diary" system.

    Requirements:
  • The program should ask: "Who are you? (Enter Username)"
  • It should then ask: "1. Write entry" or "2. Search entries".
  • Write: If the user writes an entry, embed it and upsert it to Pinecone using the Username as the Namespace.
  • Search: If the user searches, query Pinecone using that specific Username Namespace.
  • Verification: Run the program as "UserA", add a secret. Restart the program as "UserB", search for that secret. You should get zero results.
  • Example Output:
    Login: james_bond
    

    > 1. Write

    Enter text: The code for the safe is 1234.

    Saved to cloud.

    ... (Restart) ...

    Login: villain

    > 2. Search

    Query: safe code

    Results: No matches found.

    Hints:

    * Your namespace variable in the upsert and query calls should be the string the user typed in at the start.

    * Make sure you generate unique IDs for every entry (you can use import uuid and str(uuid.uuid4()) to generate random ID strings).

    What You Learned

    Today you moved from local computing to cloud engineering. You learned:

    * Cloud Persistence: Your data now lives on a server, accessible from any machine with the API key.

    * Vector Dimensions: You learned that the vector size (1536) must strictly match the model used to create it.

    * Namespaces: You mastered the art of keeping data separated within a single database, a critical skill for building SaaS (Software as a Service) applications.

    Why This Matters:

    In the real world, you rarely build AI apps for just yourself. You build them for hundreds or thousands of users. Pinecone handles the heavy lifting of searching through millions of records in milliseconds, while Namespaces allow you to keep every user's data private and secure.

    Tomorrow: We combine everything. We will take a PDF, chop it up, store it in Pinecone, and build a full RAG pipeline where you can Chat with a PDF. See you then!