NoSQL Concepts & Document Storage
What You'll Build Today
Welcome to Day 18! Yesterday, we looked at SQL, which is structured, rigid, and reliable—like a well-organized Excel spreadsheet. Today, we are going to look at its rebellious cousin: NoSQL (specifically, Document Storage).
In the world of Generative AI, data is rarely tidy. An AI model might return a simple text sentence one second, a list of three generated images the next, and a function call with complex arguments after that. If you try to force this "messy" data into a rigid table, you will break things.
Today, you will build a Flexible Chat History System. This system will be able to store conversation logs where every message looks completely different, without you ever having to redesign your database.
Here is what you will learn and why:
* Document Databases (NoSQL): You will learn how to store data as "documents" rather than rows, allowing you to save complex, nested data structures easily.
* JSON Data: You will master the format that powers the entire internet and almost every AI API (OpenAI, Anthropic, etc.).
* Flexible Schemas: You will learn how to add new data fields on the fly without breaking existing code.
* TinyDB: You will use a lightweight, beginner-friendly Python database tool that lives in a single file, perfect for prototyping AI apps.
Let's get messy!
The Problem
Imagine you are building a chat application. At first, it is simple: a user sends text, and the AI replies with text.
You might decide to store this in a rigid structure, like a list of dictionaries where every dictionary must look the same.
Here is the code you write to handle your chat history:
# A list representing our rigid database
chat_history = [
{"role": "user", "content": "Hello, who are you?"},
{"role": "assistant", "content": "I am a helpful AI."}
]
def display_history(history):
print("--- Chat Log ---")
for message in history:
# We assume every message has a 'content' field
print(f"{message['role']}: {message['content']}")
display_history(chat_history)
This works fine. But then, you upgrade your AI. Now, users can ask the AI to generate images. An image message doesn't have "content" text; it has an "image_url".
You try to add this new message type to your list:
# Adding a new type of message (Image)
new_message = {"role": "assistant", "image_url": "http://example.com/cat.png"}
chat_history.append(new_message)
# Run the display function again
display_history(chat_history)
The Crash:
When you run this, Python crashes with a KeyError: 'content'. Your code tried to look for message['content'] in the image message, but it doesn't exist.
To fix this in a rigid system (like SQL or strict CSVs), you have two bad options:
image_url column to every single previous text message (waste of space).if statements to check what keys exist before printing.# The "Spaghetti Code" fix - it gets ugly fast
for message in chat_history:
if "content" in message:
print(f"{message['role']}: {message['content']}")
elif "image_url" in message:
print(f"{message['role']}: [Image: {message['image_url']}]")
elif "tool_call" in message:
# It never ends...
pass
This is painful. As your AI gets smarter, your database structure fights against you. You spend more time managing columns and missing keys than building cool features.
There has to be a way to just "save the object" regardless of what shape it is.
Let's Build It
The solution is a NoSQL Document Database.
In a Document Database, we don't have rows and columns. We have Collections (like a folder) and Documents (like individual files inside that folder).
* Document A can have name and email.
* Document B can have name, age, and favorite_color.
The database doesn't care. It just stores what you give it.
We will use a library called tinydb. It is not meant for massive websites like Facebook, but it is excellent for local AI tools and learning concepts.
Step 1: Installation and Setup
First, you need to install the library.
Open your terminal or command prompt:
``bash
pip install tinydb
jsonmetadataNow, let's create a database and save our first simple message.
from tinydb import TinyDB, Query # 1. Create (or connect to) the database file # This will create a file named 'chat_db.json' in your folderdb = TinyDB('chat_db.json')
# 2. Clear the database so we start fresh every time we run this script # (Useful for learning, but you wouldn't do this in a real app!)db.truncate()
# 3. Insert a simple text message # Note: We just pass a Python dictionary. No SQL commands needed.db.insert({
"role": "user",
"type": "text",
"content": "Hello, AI!"
})
print("Database created and first message saved!")
print("Check your folder for 'chat_db.json'.")
Step 2: The Power of Flexibility
Now for the magic. We will insert a message with a completely different structure—an image generation result. We don't need to change any settings. We just insert it.
Why this matters: Notice thefrom tinydb import TinyDBdb = TinyDB('chat_db.json')
# 1. Insert a standard text responsedb.insert({
"role": "assistant",
"type": "text",
"content": "Hello! I can generate images for you."
})
# 2. Insert a COMPLETELY different structure (Image) # It has 'url' and 'resolution' instead of 'content'db.insert({
"role": "assistant",
"type": "image",
"url": "https://myserver.com/robot_art.png",
"resolution": "1024x1024",
"metadata": {
"model": "dall-e-3",
"cost": 0.04
}
})
print("Inserted mixed data types successfully.")
print("-" * 20)
# 3. Print all data to prove it's thereall_messages = db.all()
for msg in all_messages:
print(msg)
field in the second message? It's a dictionary inside the message. SQL struggles with nested data like this. NoSQL loves it.SELECT * FROM table WHERE...Step 3: Querying (Searching) Data
Storing data is useless if you can't find it. In SQL, you use
. In TinyDB, you use aQueryobject.QueryThink of the
object as a "search template."liked: Truefrom tinydb import TinyDB, Querydb = TinyDB('chat_db.json')
# 1. Create a Query object # This acts as a placeholder for our searchMessage = Query()
# 2. Search for all messages where the role is 'assistant'print("--- Assistant Messages ---")
assistant_msgs = db.search(Message.role == 'assistant')
for msg in assistant_msgs:
print(f"Found one: {msg}")
# 3. Search for specific types (e.g., images)print("\n--- Image Messages ---")
images = db.search(Message.type == 'image')
for img in images:
print(f"Image URL: {img['url']}")
Step 4: Updating Data
Imagine the user liked the image generated by the AI. We want to add a
flag to that specific record..get()We can update records based on a search query.
from tinydb import TinyDB, Querydb = TinyDB('chat_db.json')
Message = Query()
# 1. Find the image message and add a 'liked' field # We search for the specific URL to identify the messagetarget_url = "https://myserver.com/robot_art.png"
# The update function takes two arguments: # 1. The data to add/change # 2. The condition to find which documents to changedb.update({"liked": True}, Message.url == target_url)
# 2. Verify the changeupdated_msg = db.search(Message.url == target_url)
print("--- Updated Record ---")
print(updated_msg)
Step 5: Handling Missing Keys (The Solution)
Now that we have mixed data, how do we display it without crashing? We use the
method, which is safe to use on dictionaries.dictionary.get("key", "default_value")attempts to find the key. If it fails, it returns the default value instead of crashing.datetimeOutput:from tinydb import TinyDBdb = TinyDB('chat_db.json')
print("--- Safe Chat History Display ---")
for msg in db.all():
role = msg.get('role', 'unknown')
# Check the type to decide how to printmsg_type = msg.get('type', 'text')
if msg_type == 'text':
content = msg.get('content', '')
print(f"[{role}]: {content}")
elif msg_type == 'image':
url = msg.get('url', 'no url')
liked = " (Liked!)" if msg.get('liked') else ""
print(f"[{role}]: [Image displayed from {url}]{liked}")
--- Safe Chat History Display ---[user]: Hello, AI!
[assistant]: Hello! I can generate images for you.
[assistant]: [Image displayed from https://myserver.com/robot_art.png] (Liked!)
We now have a robust system that handles text, images, and metadata without a single crash, all stored in a permanent file.
Now You Try
It is time to extend your database skills. Create a new script for these tasks.
1. The Timestamp ExtensionWhen inserting data, import the
module. Add atimestampfield to every message you insert containing the current time (string format). Hint:import datetime; str(datetime.datetime.now())2. The "Mark as Read" Featureread_status: TrueWrite a script that updates every message in the database to have a new field:
. Hint: Indb.update, if you omit the second argument (the condition), it applies to ALL documents. 3. The Cleanup CrewtypeWrite a script that deletes all messages where the
is "text". Keep the images. Hint: Look updb.remove(...)in the TinyDB documentation or use your IDE's autocomplete to see how it works. It works very similarly tosearch.log_interaction(user_text, ai_response_json)Challenge Project: The OpenAI Logger
When you build real AI apps, you send a prompt to OpenAI and get back a huge, complex JSON object containing the answer, the number of tokens used (cost), the model version, and finish reasons.
Your challenge is to build a Conversation Logger.
Requirements:Create a function that saves data tologs.json.user_textThe should be stored as a simple string field.ai_response_jsonThe will be a nested dictionary (simulating a real API response). Store this entire structure inside your database document under a field calledraw_response.cost_tokensAdd a top-level field called that extracts thetotal_tokensfrom the nested JSON. Example Input Data (Use this to test):Desired Database Document Structure:# Simulating what OpenAI sends backmock_openai_response = {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Python is awesome!"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
user_input = "What do you think of Python?"
Your database should contain a document that looks like this:
{
"user_prompt": "What do you think of Python?",
"timestamp": "2023-10-27 10:00:00",
"cost_tokens": 21,
"raw_response": { ... the full dictionary from above ... }
}
`
Hints:
* You are combining flat data (
user_prompt) with nested data (raw_response).
* To get
cost_tokens, you will need to access mock_openai_response['usage']['total_tokens']`.
What You Learned
Today you broke free from the constraints of rows and columns.
* NoSQL vs SQL: You learned that while SQL is great for strict relationships (like banking ledgers), NoSQL is perfect for flexible, evolving data (like chat logs or AI outputs).
* JSON Documents: You learned that data can be nested. A document can contain lists and other dictionaries.
* TinyDB: You used a file-based NoSQL database to persist data between program runs.
Why This Matters:In the coming days, you will start building backend servers. These servers will receive data from the outside world. This data is almost always in JSON format. Understanding how to catch this JSON and store it without needing to "flatten" it into a table is a superpower for AI engineering.
Tomorrow: We move to the backbone of the internet. You will learn about Backend Architecture—how servers work, what an API endpoint is, and how computers talk to each other.