Day 35 of 80

Multi-Turn Conversations & Memory

Phase 4: Prompt Engineering

What You'll Build Today

Welcome to Day 35! Today is a pivotal moment in your journey. Up until now, every time you ran a prompt, it was an isolated event. You asked a question, the AI answered, and then it immediately forgot you existed.

Today, we are going to build a Chatbot with Memory.

We will move from "Single-Turn" tasks to "Multi-Turn" conversations. You will build a system that can hold a conversation, remember your name, recall previous questions, and handle the limitations of AI memory.

Here is what you will master today:

* Conversation History Management: Why you must manually feed the AI its own previous answers to create the illusion of memory.

* The Context Window Limit: Understanding that AI has a limit on how much text it can process at once, and why your program will crash if you ignore it.

* The Sliding Window Strategy: How to keep the most recent conversation active while dropping old, irrelevant chatter.

* Summarization Memory: A more advanced technique where the AI summarizes the conversation so far to save space without forgetting the important details.

Let's turn that amnesiac AI into a helpful companion.

---

The Problem

Imagine you are texting a friend. You say, "I'm going to buy a Honda Civic." They reply, "Cool choice!" Then you text, "How much does it cost?"

Your friend knows "it" refers to the Honda Civic.

Now, let's try that with the code you know how to write right now.

The Broken Code

Here is a standard interaction loop using what we have learned so far. Read this code carefully and try to predict what happens.

from openai import OpenAI

client = OpenAI()

print("Chatbot initialized. Type 'quit' to exit.")

while True:

user_input = input("\nYou: ")

if user_input.lower() == 'quit':

break

# We send the user's current question to the AI

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=[

{"role": "user", "content": user_input}

]

)

print(f"AI: {response.choices[0].message.content}")

The Painful Result

If you run this, here is the conversation you will get:

You: Hi, my name is Sarah. AI: Hello Sarah! How can I help you today? You: What is my name? AI: I am sorry, but I don't have access to your personal information or your name. Frustrating, right?

Why This Happens

Large Language Models (LLMs) are stateless. This means they do not have a hard drive where they store your chat. Every time you send a request to the API, it is a brand new, blank slate. The AI doesn't remember you from 10 seconds ago because that previous execution of the code is finished and gone.

To fix this, we have to be the memory. We must send the entire transcript of the conversation back to the AI every single time we speak.

---

Let's Build It

We will solve this in steps. First, we will fix the amnesia. Then, we will fix the problem that arises when we remember too much.

Step 1: The List as Memory

We need a Python list to store the messages. Every time the user speaks, we add it to the list. Every time the AI replies, we add that to the list too. Then, we send the whole list to the API.

from openai import OpenAI

client = OpenAI()

# 1. Initialize an empty list to hold the conversation

history = []

print("Memory Bot v1. Type 'quit' to exit.")

while True:

user_input = input("\nYou: ")

if user_input.lower() == 'quit':

break

# 2. Add the user's message to history

history.append({"role": "user", "content": user_input})

# 3. Send the WHOLE history, not just the new input

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=history

)

ai_reply = response.choices[0].message.content

print(f"AI: {ai_reply}")

# 4. Add the AI's reply to history so it remembers what it said

history.append({"role": "assistant", "content": ai_reply})

Run this code.

Try the previous test:

You: Hi, my name is Sarah. AI: Hello Sarah! You: What is my name? AI: Your name is Sarah.

It works! But we have created a new problem.

Step 2: The Context Window Problem

LLMs charge you by the "token" (roughly part of a word). In the code above, the list history gets longer with every turn.

* Turn 1: You send 10 tokens.

* Turn 2: You send Turn 1 (10) + Answer 1 (10) + Turn 2 (10) = 30 tokens.

* Turn 100: You are sending a massive novel every time you say "hello".

Eventually, you will hit the Context Window Limit (the maximum text the AI can handle), and your program will crash with an error. Plus, it costs more money every message.

We need a strategy to manage this.

Step 3: The Sliding Window (Pruning)

The simplest strategy is the "Sliding Window." We only keep the last $N$ messages. As new messages come in, old ones fall off the edge.

However, we usually want to keep the System Prompt (the instructions telling the AI who it is) forever, even if we delete old chat history.

Let's implement a system that keeps the System Prompt + the last 4 messages.

from openai import OpenAI

client = OpenAI()

# The system prompt is permanent

system_prompt = {"role": "system", "content": "You are a sarcastic robot assistant."}

# This list holds the back-and-forth chat

chat_log = []

def get_chat_response(user_text):

# Add user to log

chat_log.append({"role": "user", "content": user_text})

# SLIDING WINDOW LOGIC: # We want the System Prompt + the last 4 messages from the log # If the log is shorter than 4, take the whole thing.

recent_messages = chat_log[-4:]

# Combine them for the API call

messages_to_send = [system_prompt] + recent_messages

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=messages_to_send

)

ai_text = response.choices[0].message.content

# Add AI response to log

chat_log.append({"role": "assistant", "content": ai_text})

return ai_text

# Testing the Sliding Window

print("--- Sliding Window Test ---")

print("Bot: I will only remember the last 2 exchanges (4 messages).")

while True:

u_in = input("\nYou: ")

if u_in.lower() == 'quit': break

reply = get_chat_response(u_in)

print(f"Bot: {reply}")

# Debug print to show you what is actually being sent

print(f"[Debug info: I currently have {len(chat_log)} messages in history]")

Why this matters: This prevents the crash. The conversation can go on forever. The downside: If you told the bot your name 10 messages ago, it has now forgotten it because that message "slid" out of the window.

Step 4: Summary-Based Memory

How do we solve the downside of the Sliding Window? We can ask the AI to summarize the conversation as it gets too long.

Instead of deleting old messages, we turn them into a single "Summary" message.

from openai import OpenAI

client = OpenAI()

conversation = [

{"role": "system", "content": "You are a helpful assistant."}

]

def summarize_conversation(history):

# Create a separate call just to summarize

print("\n[System: Summarizing old conversation to save space...]\n")

summary_prompt = "Summarize the following conversation in 1 sentence, keeping key facts:"

# We send the history to be summarized

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=history + [{"role": "user", "content": summary_prompt}]

)

summary_text = response.choices[0].message.content

return summary_text

while True:

user_input = input("\nYou: ")

if user_input.lower() == 'quit': break

conversation.append({"role": "user", "content": user_input})

# LOGIC: Check if memory is getting too full # (We are using 5 messages as a very low limit for demonstration)

if len(conversation) > 5:

# 1. Keep the system prompt (index 0)

sys_msg = conversation[0]

# 2. Extract the messages to be summarized (everything except the last 2) # We want to keep the immediate context (last question/answer) fresh

msgs_to_summarize = conversation[1:-2]

# 3. Generate summary

summary = summarize_conversation(msgs_to_summarize)

# 4. Create a new context message

summary_msg = {"role": "system", "content": f"Previous conversation summary: {summary}"}

# 5. Rebuild conversation: System Prompt + Summary + Last 2 Messages

conversation = [sys_msg, summary_msg] + conversation[-2:]

# Standard generation step

response = client.chat.completions.create(

model="gpt-4o-mini",

messages=conversation

)

ai_reply = response.choices[0].message.content

print(f"AI: {ai_reply}")

conversation.append({"role": "assistant", "content": ai_reply})

Run this code.

Talk to it for a while. Watch the console. After 5 messages, you will see it pause to summarize. It compresses the middle of the conversation but keeps the context alive!

---

Now You Try

You have the building blocks. Now extend the functionality.

  • The "Forget Me" Command
  • Modify the loop so that if the user types /clear, the conversation list is reset to just the System Prompt. Print "Memory wiped!" when this happens. This is useful when you want to change topics completely.

  • Save Chat to File
  • At the end of the loop (or after every message), append the new message to a text file named chat_log.txt. This creates a permanent record of your chat even after you close Python.

  • The Verbose Mode
  • Add a boolean variable verbose = True. If this is True, print the actual list of messages being sent to the API before every call. This helps you visualize exactly what the "Sliding Window" or "Summary" logic is doing behind the scenes.

    ---

    Challenge Project: The Memory Olympics

    Your goal is to compare the three memory strategies to see which one performs best for fact retrieval.

    Requirements:
  • Create a list of 10 mock messages representing a conversation where a user introduces themselves, mentions their favorite color, their dog's name, and their city.
  • Implement a function ask_bot(strategy, question) that takes this mock history and a final question (e.g., "What is my dog's name?") and runs it through the AI.
  • The strategies to implement:
  • * Strategy A (Full History): Pass all 10 messages.

    * Strategy B (Tiny Window): Pass only the last 3 messages.

    * Strategy C (Summarized): Pass a summary of the first 7 messages + the raw last 3 messages.

  • Print the answer from each strategy.
  • Expected Outcome:

    * Full History: Should get the answer right (but would be expensive in real life).

    * Tiny Window: Should fail (hallucinate or say "I don't know") because the dog's name was in message #2, which was cut off.

    * Summarized: Should hopefully get it right, proving that summarization preserves facts while saving space.

    Hint:

    For Strategy C, you will need to run two API calls. One to generate the summary of the first 7 messages, and a second one to answer the question using that summary.

    ---

    What You Learned

    Today you tackled one of the most fundamental challenges in LLM development: State Management.

    * Statelessness: You learned that APIs don't remember you; you have to remind them.

    * Context Management: You learned that memory is finite (and expensive).

    * Pruning (Sliding Window): A fast, cheap way to keep the conversation moving, at the cost of long-term memory.

    * Summarization: A smart way to compress information, trading some detail for longer retention.

    Why This Matters:

    Every major AI product, from ChatGPT to customer service bots, uses these exact techniques. If they sent your entire 3-year chat history every time you said "Hi," they would go bankrupt in server costs. They all use sophisticated versions of sliding windows and summarization vectors.

    Phase 4 Complete!

    You have mastered Prompt Engineering, from basic inputs to advanced formatting, role-playing, and now memory management.

    Tomorrow: We enter Phase 5. We are going to give the AI eyes. We will start RAG (Retrieval Augmented Generation), which allows the AI to read your personal PDF documents and answer questions about them. See you there!