Day 18 of 80

NoSQL Concepts & Document Storage

Phase 2: Software Foundations

What You'll Build Today

Welcome to Day 18! Yesterday, we looked at SQL, which is structured, rigid, and reliable—like a well-organized Excel spreadsheet. Today, we are going to look at its rebellious cousin: NoSQL (specifically, Document Storage).

In the world of Generative AI, data is rarely tidy. An AI model might return a simple text sentence one second, a list of three generated images the next, and a function call with complex arguments after that. If you try to force this "messy" data into a rigid table, you will break things.

Today, you will build a Flexible Chat History System. This system will be able to store conversation logs where every message looks completely different, without you ever having to redesign your database.

Here is what you will learn and why:

* Document Databases (NoSQL): You will learn how to store data as "documents" rather than rows, allowing you to save complex, nested data structures easily.

* JSON Data: You will master the format that powers the entire internet and almost every AI API (OpenAI, Anthropic, etc.).

* Flexible Schemas: You will learn how to add new data fields on the fly without breaking existing code.

* TinyDB: You will use a lightweight, beginner-friendly Python database tool that lives in a single file, perfect for prototyping AI apps.

Let's get messy!

The Problem

Imagine you are building a chat application. At first, it is simple: a user sends text, and the AI replies with text.

You might decide to store this in a rigid structure, like a list of dictionaries where every dictionary must look the same.

Here is the code you write to handle your chat history:

# A list representing our rigid database
chat_history = [
    {"role": "user", "content": "Hello, who are you?"},
    {"role": "assistant", "content": "I am a helpful AI."}
]

def display_history(history):
    print("--- Chat Log ---")
    for message in history:
        # We assume every message has a 'content' field
        print(f"{message['role']}: {message['content']}")

display_history(chat_history)

This works fine. But then, you upgrade your AI. Now, users can ask the AI to generate images. An image message doesn't have "content" text; it has an "image_url".

You try to add this new message type to your list:

# Adding a new type of message (Image)
new_message = {"role": "assistant", "image_url": "http://example.com/cat.png"}
chat_history.append(new_message)

# Run the display function again
display_history(chat_history)

The Crash:

When you run this, Python crashes with a KeyError: 'content'. Your code tried to look for message['content'] in the image message, but it doesn't exist.

To fix this in a rigid system (like SQL or strict CSVs), you have two bad options:

Change the Schema: Go back and add an empty image_url column to every single previous text message (waste of space).

Spaghetti Code: Write endless if statements to check what keys exist before printing.

# The "Spaghetti Code" fix - it gets ugly fast
for message in chat_history:
    if "content" in message:
        print(f"{message['role']}: {message['content']}")
    elif "image_url" in message:
        print(f"{message['role']}: [Image: {message['image_url']}]")
    elif "tool_call" in message:
        # It never ends...
        pass

This is painful. As your AI gets smarter, your database structure fights against you. You spend more time managing columns and missing keys than building cool features.

There has to be a way to just "save the object" regardless of what shape it is.

Let's Build It

The solution is a NoSQL Document Database.

In a Document Database, we don't have rows and columns. We have Collections (like a folder) and Documents (like individual files inside that folder).

* Document A can have name and email.

* Document B can have name, age, and favorite_color.

The database doesn't care. It just stores what you give it.

We will use a library called tinydb. It is not meant for massive websites like Facebook, but it is excellent for local AI tools and learning concepts.

Step 1: Installation and Setup

First, you need to install the library.

Open your terminal or command prompt:

``bash


pip install tinydb

Now, let's create a database and save our first simple message.

from tinydb import TinyDB, Query

# 1. Create (or connect to) the database file
# This will create a file named 'chat_db.json' in your folder
db = TinyDB('chat_db.json')

# 2. Clear the database so we start fresh every time we run this script
# (Useful for learning, but you wouldn't do this in a real app!)
db.truncate()

# 3. Insert a simple text message
# Note: We just pass a Python dictionary. No SQL commands needed.
db.insert({
    "role": "user", 
    "type": "text", 
    "content": "Hello, AI!"
})

print("Database created and first message saved!")
print("Check your folder for 'chat_db.json'.")


Step 2: The Power of Flexibility

Now for the magic. We will insert a message with a completely different structure—an image generation result. We don't need to change any settings. We just insert it.

from tinydb import TinyDB

db = TinyDB('chat_db.json')

# 1. Insert a standard text response
db.insert({
    "role": "assistant",
    "type": "text",
    "content": "Hello! I can generate images for you."
})

# 2. Insert a COMPLETELY different structure (Image)
# It has 'url' and 'resolution' instead of 'content'
db.insert({
    "role": "assistant",
    "type": "image",
    "url": "https://myserver.com/robot_art.png",
    "resolution": "1024x1024",
    "metadata": {
        "model": "dall-e-3",
        "cost": 0.04
    }
})

print("Inserted mixed data types successfully.")
print("-" * 20)

# 3. Print all data to prove it's there
all_messages = db.all()
for msg in all_messages:
    print(msg)


Why this matters: Notice the metadata field in the second message? It's a dictionary inside the message. SQL struggles with nested data like this. NoSQL loves it.

Step 3: Querying (Searching) Data

Storing data is useless if you can't find it. In SQL, you use SELECT * FROM table WHERE.... In TinyDB, you use a Query object.


Think of the Query object as a "search template."


from tinydb import TinyDB, Query

db = TinyDB('chat_db.json')

# 1. Create a Query object
# This acts as a placeholder for our search
Message = Query()

# 2. Search for all messages where the role is 'assistant'
print("--- Assistant Messages ---")
assistant_msgs = db.search(Message.role == 'assistant')

for msg in assistant_msgs:
    print(f"Found one: {msg}")

# 3. Search for specific types (e.g., images)
print("\n--- Image Messages ---")
images = db.search(Message.type == 'image')

for img in images:
    print(f"Image URL: {img['url']}")


Step 4: Updating Data

Imagine the user liked the image generated by the AI. We want to add a liked: True flag to that specific record.


We can update records based on a search query.

from tinydb import TinyDB, Query

db = TinyDB('chat_db.json')
Message = Query()

# 1. Find the image message and add a 'liked' field
# We search for the specific URL to identify the message
target_url = "https://myserver.com/robot_art.png"

# The update function takes two arguments:
# 1. The data to add/change
# 2. The condition to find which documents to change
db.update({"liked": True}, Message.url == target_url)

# 2. Verify the change
updated_msg = db.search(Message.url == target_url)
print("--- Updated Record ---")
print(updated_msg)


Step 5: Handling Missing Keys (The Solution)

Now that we have mixed data, how do we display it without crashing? We use the .get() method, which is safe to use on dictionaries.


dictionary.get("key", "default_value") attempts to find the key. If it fails, it returns the default value instead of crashing.

from tinydb import TinyDB

db = TinyDB('chat_db.json')

print("--- Safe Chat History Display ---")

for msg in db.all():
    role = msg.get('role', 'unknown')
    
    # Check the type to decide how to print
    msg_type = msg.get('type', 'text')
    
    if msg_type == 'text':
        content = msg.get('content', '')
        print(f"[{role}]: {content}")
        
    elif msg_type == 'image':
        url = msg.get('url', 'no url')
        liked = " (Liked!)" if msg.get('liked') else ""
        print(f"[{role}]: [Image displayed from {url}]{liked}")


Output:
--- Safe Chat History Display ---
[user]: Hello, AI!
[assistant]: Hello! I can generate images for you.
[assistant]: [Image displayed from https://myserver.com/robot_art.png] (Liked!)


We now have a robust system that handles text, images, and metadata without a single crash, all stored in a permanent file.

Now You Try

It is time to extend your database skills. Create a new script for these tasks.

1. The Timestamp Extension
When inserting data, import the datetime module. Add a timestamp field to every message you insert containing the current time (string format).

Hint: import datetime; str(datetime.datetime.now())

2. The "Mark as Read" Feature
Write a script that updates every message in the database to have a new field: read_status: True.

Hint: In db.update, if you omit the second argument (the condition), it applies to ALL documents.

3. The Cleanup Crew
Write a script that deletes all messages where the type is "text". Keep the images.

Hint: Look up db.remove(...) in the TinyDB documentation or use your IDE's autocomplete to see how it works. It works very similarly to search.

Challenge Project: The OpenAI Logger

When you build real AI apps, you send a prompt to OpenAI and get back a huge, complex JSON object containing the answer, the number of tokens used (cost), the model version, and finish reasons.

Your challenge is to build a Conversation Logger.

Requirements:
 Create a function log_interaction(user_text, ai_response_json) that saves data to logs.json.

 The user_text should be stored as a simple string field.

 The ai_response_json will be a nested dictionary (simulating a real API response). Store this entire structure inside your database document under a field called raw_response.

 Add a top-level field called cost_tokens that extracts the total_tokens from the nested JSON.


Example Input Data (Use this to test):

# Simulating what OpenAI sends back
mock_openai_response = {
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "gpt-3.5-turbo",
    "choices": [{
        "index": 0,
        "message": {
            "role": "assistant",
            "content": "Python is awesome!"
        },
        "finish_reason": "stop"
    }],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
    }
}

user_input = "What do you think of Python?"


Desired Database Document Structure:
Your database should contain a document that looks like this:
json
{
    "user_prompt": "What do you think of Python?",
    "timestamp": "2023-10-27 10:00:00",
    "cost_tokens": 21,
    "raw_response": { ... the full dictionary from above ... }
}



Hints:

* You are combining flat data (user_prompt) with nested data (raw_response).

* To get cost_tokens, you will need to access mock_openai_response['usage']['total_tokens']`.

What You Learned

Today you broke free from the constraints of rows and columns.

* NoSQL vs SQL: You learned that while SQL is great for strict relationships (like banking ledgers), NoSQL is perfect for flexible, evolving data (like chat logs or AI outputs).

* JSON Documents: You learned that data can be nested. A document can contain lists and other dictionaries.

* TinyDB: You used a file-based NoSQL database to persist data between program runs.

Why This Matters:

In the coming days, you will start building backend servers. These servers will receive data from the outside world. This data is almost always in JSON format. Understanding how to catch this JSON and store it without needing to "flatten" it into a table is a superpower for AI engineering.

Tomorrow: We move to the backbone of the internet. You will learn about Backend Architecture—how servers work, what an API endpoint is, and how computers talk to each other.

← Day 17 Day 19 →