Day 18 of 80

NoSQL Concepts & Document Storage

Phase 2: Software Foundations

What You'll Build Today

Welcome to Day 18! Yesterday, we looked at SQL, which is structured, rigid, and reliable—like a well-organized Excel spreadsheet. Today, we are going to look at its rebellious cousin: NoSQL (specifically, Document Storage).

In the world of Generative AI, data is rarely tidy. An AI model might return a simple text sentence one second, a list of three generated images the next, and a function call with complex arguments after that. If you try to force this "messy" data into a rigid table, you will break things.

Today, you will build a Flexible Chat History System. This system will be able to store conversation logs where every message looks completely different, without you ever having to redesign your database.

Here is what you will learn and why:

* Document Databases (NoSQL): You will learn how to store data as "documents" rather than rows, allowing you to save complex, nested data structures easily.

* JSON Data: You will master the format that powers the entire internet and almost every AI API (OpenAI, Anthropic, etc.).

* Flexible Schemas: You will learn how to add new data fields on the fly without breaking existing code.

* TinyDB: You will use a lightweight, beginner-friendly Python database tool that lives in a single file, perfect for prototyping AI apps.

Let's get messy!

The Problem

Imagine you are building a chat application. At first, it is simple: a user sends text, and the AI replies with text.

You might decide to store this in a rigid structure, like a list of dictionaries where every dictionary must look the same.

Here is the code you write to handle your chat history:

# A list representing our rigid database

chat_history = [

{"role": "user", "content": "Hello, who are you?"},

{"role": "assistant", "content": "I am a helpful AI."}

]

def display_history(history):

print("--- Chat Log ---")

for message in history:

# We assume every message has a 'content' field

print(f"{message['role']}: {message['content']}")

display_history(chat_history)

This works fine. But then, you upgrade your AI. Now, users can ask the AI to generate images. An image message doesn't have "content" text; it has an "image_url".

You try to add this new message type to your list:

# Adding a new type of message (Image)

new_message = {"role": "assistant", "image_url": "http://example.com/cat.png"}

chat_history.append(new_message)

# Run the display function again

display_history(chat_history)

The Crash:

When you run this, Python crashes with a KeyError: 'content'. Your code tried to look for message['content'] in the image message, but it doesn't exist.

To fix this in a rigid system (like SQL or strict CSVs), you have two bad options:

  • Change the Schema: Go back and add an empty image_url column to every single previous text message (waste of space).
  • Spaghetti Code: Write endless if statements to check what keys exist before printing.
  • # The "Spaghetti Code" fix - it gets ugly fast
    

    for message in chat_history:

    if "content" in message:

    print(f"{message['role']}: {message['content']}")

    elif "image_url" in message:

    print(f"{message['role']}: [Image: {message['image_url']}]")

    elif "tool_call" in message:

    # It never ends...

    pass

    This is painful. As your AI gets smarter, your database structure fights against you. You spend more time managing columns and missing keys than building cool features.

    There has to be a way to just "save the object" regardless of what shape it is.

    Let's Build It

    The solution is a NoSQL Document Database.

    In a Document Database, we don't have rows and columns. We have Collections (like a folder) and Documents (like individual files inside that folder).

    * Document A can have name and email.

    * Document B can have name, age, and favorite_color.

    The database doesn't care. It just stores what you give it.

    We will use a library called tinydb. It is not meant for massive websites like Facebook, but it is excellent for local AI tools and learning concepts.

    Step 1: Installation and Setup

    First, you need to install the library.

    Open your terminal or command prompt:

    ``bash

    pip install tinydb

    
    

    Now, let's create a database and save our first simple message.

    from tinydb import TinyDB, Query
    
    # 1. Create (or connect to) the database file
    # This will create a file named 'chat_db.json' in your folder
    

    db = TinyDB('chat_db.json')

    # 2. Clear the database so we start fresh every time we run this script # (Useful for learning, but you wouldn't do this in a real app!)

    db.truncate()

    # 3. Insert a simple text message # Note: We just pass a Python dictionary. No SQL commands needed.

    db.insert({

    "role": "user",

    "type": "text",

    "content": "Hello, AI!"

    })

    print("Database created and first message saved!")

    print("Check your folder for 'chat_db.json'.")

    Step 2: The Power of Flexibility

    Now for the magic. We will insert a message with a completely different structure—an image generation result. We don't need to change any settings. We just insert it.

    from tinydb import TinyDB
    
    

    db = TinyDB('chat_db.json')

    # 1. Insert a standard text response

    db.insert({

    "role": "assistant",

    "type": "text",

    "content": "Hello! I can generate images for you."

    })

    # 2. Insert a COMPLETELY different structure (Image) # It has 'url' and 'resolution' instead of 'content'

    db.insert({

    "role": "assistant",

    "type": "image",

    "url": "https://myserver.com/robot_art.png",

    "resolution": "1024x1024",

    "metadata": {

    "model": "dall-e-3",

    "cost": 0.04

    }

    })

    print("Inserted mixed data types successfully.")

    print("-" * 20)

    # 3. Print all data to prove it's there

    all_messages = db.all()

    for msg in all_messages:

    print(msg)

    Why this matters: Notice the
    metadata field in the second message? It's a dictionary inside the message. SQL struggles with nested data like this. NoSQL loves it.

    Step 3: Querying (Searching) Data

    Storing data is useless if you can't find it. In SQL, you use SELECT * FROM table WHERE.... In TinyDB, you use a Query object.

    Think of the Query object as a "search template."

    from tinydb import TinyDB, Query
    
    

    db = TinyDB('chat_db.json')

    # 1. Create a Query object # This acts as a placeholder for our search

    Message = Query()

    # 2. Search for all messages where the role is 'assistant'

    print("--- Assistant Messages ---")

    assistant_msgs = db.search(Message.role == 'assistant')

    for msg in assistant_msgs:

    print(f"Found one: {msg}")

    # 3. Search for specific types (e.g., images)

    print("\n--- Image Messages ---")

    images = db.search(Message.type == 'image')

    for img in images:

    print(f"Image URL: {img['url']}")

    Step 4: Updating Data

    Imagine the user liked the image generated by the AI. We want to add a liked: True flag to that specific record.

    We can update records based on a search query.

    from tinydb import TinyDB, Query
    
    

    db = TinyDB('chat_db.json')

    Message = Query()

    # 1. Find the image message and add a 'liked' field # We search for the specific URL to identify the message

    target_url = "https://myserver.com/robot_art.png"

    # The update function takes two arguments: # 1. The data to add/change # 2. The condition to find which documents to change

    db.update({"liked": True}, Message.url == target_url)

    # 2. Verify the change

    updated_msg = db.search(Message.url == target_url)

    print("--- Updated Record ---")

    print(updated_msg)

    Step 5: Handling Missing Keys (The Solution)

    Now that we have mixed data, how do we display it without crashing? We use the .get() method, which is safe to use on dictionaries.

    dictionary.get("key", "default_value") attempts to find the key. If it fails, it returns the default value instead of crashing.
    from tinydb import TinyDB
    
    

    db = TinyDB('chat_db.json')

    print("--- Safe Chat History Display ---")

    for msg in db.all():

    role = msg.get('role', 'unknown')

    # Check the type to decide how to print

    msg_type = msg.get('type', 'text')

    if msg_type == 'text':

    content = msg.get('content', '')

    print(f"[{role}]: {content}")

    elif msg_type == 'image':

    url = msg.get('url', 'no url')

    liked = " (Liked!)" if msg.get('liked') else ""

    print(f"[{role}]: [Image displayed from {url}]{liked}")

    Output:
    --- Safe Chat History Display ---
    

    [user]: Hello, AI!

    [assistant]: Hello! I can generate images for you.

    [assistant]: [Image displayed from https://myserver.com/robot_art.png] (Liked!)

    We now have a robust system that handles text, images, and metadata without a single crash, all stored in a permanent file.

    Now You Try

    It is time to extend your database skills. Create a new script for these tasks.

    1. The Timestamp Extension

    When inserting data, import the datetime module. Add a timestamp field to every message you insert containing the current time (string format).

    Hint: import datetime; str(datetime.datetime.now()) 2. The "Mark as Read" Feature

    Write a script that updates every message in the database to have a new field: read_status: True.

    Hint: In db.update, if you omit the second argument (the condition), it applies to ALL documents. 3. The Cleanup Crew

    Write a script that deletes all messages where the type is "text". Keep the images.

    Hint: Look up db.remove(...) in the TinyDB documentation or use your IDE's autocomplete to see how it works. It works very similarly to search.

    Challenge Project: The OpenAI Logger

    When you build real AI apps, you send a prompt to OpenAI and get back a huge, complex JSON object containing the answer, the number of tokens used (cost), the model version, and finish reasons.

    Your challenge is to build a Conversation Logger.

    Requirements:
  • Create a function log_interaction(user_text, ai_response_json) that saves data to logs.json.
  • The user_text should be stored as a simple string field.
  • The ai_response_json will be a nested dictionary (simulating a real API response). Store this entire structure inside your database document under a field called raw_response.
  • Add a top-level field called cost_tokens that extracts the total_tokens from the nested JSON.
  • Example Input Data (Use this to test):
    # Simulating what OpenAI sends back
    

    mock_openai_response = {

    "id": "chatcmpl-123",

    "object": "chat.completion",

    "created": 1677652288,

    "model": "gpt-3.5-turbo",

    "choices": [{

    "index": 0,

    "message": {

    "role": "assistant",

    "content": "Python is awesome!"

    },

    "finish_reason": "stop"

    }],

    "usage": {

    "prompt_tokens": 9,

    "completion_tokens": 12,

    "total_tokens": 21

    }

    }

    user_input = "What do you think of Python?"

    Desired Database Document Structure:

    Your database should contain a document that looks like this:

    json

    {

    "user_prompt": "What do you think of Python?",

    "timestamp": "2023-10-27 10:00:00",

    "cost_tokens": 21,

    "raw_response": { ... the full dictionary from above ... }

    }

    ` Hints:

    * You are combining flat data (user_prompt) with nested data (raw_response).

    * To get cost_tokens, you will need to access mock_openai_response['usage']['total_tokens']`.

    What You Learned

    Today you broke free from the constraints of rows and columns.

    * NoSQL vs SQL: You learned that while SQL is great for strict relationships (like banking ledgers), NoSQL is perfect for flexible, evolving data (like chat logs or AI outputs).

    * JSON Documents: You learned that data can be nested. A document can contain lists and other dictionaries.

    * TinyDB: You used a file-based NoSQL database to persist data between program runs.

    Why This Matters:

    In the coming days, you will start building backend servers. These servers will receive data from the outside world. This data is almost always in JSON format. Understanding how to catch this JSON and store it without needing to "flatten" it into a table is a superpower for AI engineering.

    Tomorrow: We move to the backbone of the internet. You will learn about Backend Architecture—how servers work, what an API endpoint is, and how computers talk to each other.