Code Interpreter Patterns
Here is the comprehensive content for Day 58 of the GenAI Bootcamp.
*What You'll Build Today
Today is a turning point. Up until now, we have treated Large Language Models (LLMs) like very smart text generators. We ask a question, and they predict the answer based on their training data.
But LLMs have a secret superpower: they are terrible at doing math in their heads, but they are excellent at writing Python code that does math.
Today, we are going to build a Data Analyst Agent. This agent will take a plain English question about a dataset (like "What was the average revenue on Tuesdays?"), write the actual Python code to calculate it using the pandas library, execute that code in a secure, isolated environment, and give you the precise answer.
Here is what you will learn and why it matters:
* Sandboxed Code Execution: You will learn how to run AI-generated code safely. You never want an AI running code directly on your laptop (it could accidentally delete your files!). We will use a tool called E2B to provide a safe cloud sandbox.
* Code Generation Patterns: You will learn how to prompt the LLM specifically to act as a programmer, outputting clean, executable Python.
* The "Tool Use" Loop: You will manually build the logic that connects the "Brain" (the LLM) to the "Hands" (the code execution environment).
* Dynamic Data Analysis: You will move beyond static prompts and allow the AI to interact with real files (CSVs) to uncover insights you didn't explicitly program.
Let's turn your AI into a coder.
The Problem
Let's start with a painful reality: LLMs are bad at math.
If you ask GPT-4 to multiply two large numbers or calculate the 100th number in the Fibonacci sequence, it might get it right, but it is often just guessing the next token based on probability. It is not actually "calculating."
Furthermore, if you want to analyze a spreadsheet with 10,000 rows, you cannot fit the whole file into the prompt. You are stuck.
Look at this attempt to do a simple calculation using just the LLM.
from openai import OpenAI
import os
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# We want to know the 100th Fibonacci number.
# This requires actual computation, not just language prediction.
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Calculate the 100th Fibonacci number exactly."}
]
)
print(response.choices[0].message.content)
The Pain:
You know Python can solve this in milliseconds with a simple while loop. Why are we asking a poet (the LLM) to do the job of a calculator?
There has to be a way to say: "Hey AI, don't guess the answer. Write a Python script to calculate the answer, run it, and tell me what the computer says."
Let's Build It
We are going to solve this by integrating E2B (English to Binary). E2B provides secure, cloud-based sandboxes where your AI can run code.
Prerequisites
You will need an E2B API key (free tier is available) and your OpenAI key.
Install the necessary library:
pip install e2b-code-interpreter openai python-dotenv
Step 1: Setup and The Helper Function
First, we need a way to extract the code the LLM writes. LLMs usually wrap code in markdown backticks (``python ... `). We need a helper function to clean that up so we can run it.
Create a file named analyst_agent.py:
import os
import re
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
# Initialize OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
e2b_api_key = os.getenv("E2B_API_KEY")
def extract_code(text):
"""
Extracts Python code from markdown code blocks.
If no blocks are found, returns the raw text (risky, but a fallback).
"""
pattern = r"
python\n(.*?)\n`"
match = re.search(pattern, text, re.DOTALL)
if match:
return match.group(1)
return text
print("Setup complete. Helper function ready.")
Why this matters: The LLM is a chat bot. It loves to chat. It will say "Here is your code:" before the code. We cannot execute "Here is your code:" in Python. This regex strips away the chatter and leaves only the executable logic.
Step 2: The "Brain" (Prompting for Code)
We need to tell the LLM that its job is not to answer the question directly, but to write code that answers the question.
Add this function to your file:
def generate_python_code(user_query):
system_prompt = """
You are an expert Python Data Analyst.
Your goal is to solve the user's problem by writing Python code.
RULES:
1. Do NOT try to answer the question directly.
2. Write a complete Python script that prints the answer to stdout.
3. Use standard libraries or pandas/numpy.
4. Wrap your code in
python
blocks.
"""
response = client.chat.completions.create(
model="gpt-4o", # Or gpt-3.5-turbo
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
]
)
raw_content = response.choices[0].message.content
return extract_code(raw_content)
# Test the brain
code = generate_python_code("Calculate the sum of the first 50 prime numbers.")
print("--- Generated Code ---")
print(code)
`
Run this. You should see clean Python code printed to your terminal. It hasn't run yet, but the instructions are there.
Step 3: The "Hands" (E2B Sandbox Execution)
Now we introduce the
Sandbox. This is a tiny computer in the cloud that spins up instantly, runs the code, and shuts down.
Update your imports and add the execution logic:
from e2b_code_interpreter import Sandbox
def execute_in_sandbox(code_to_run):
print("\n--- Executing in Sandbox ---")
# We use 'with' to ensure the sandbox closes automatically
with Sandbox(api_key=e2b_api_key) as sandbox:
execution = sandbox.run_code(code_to_run)
# Check for errors
if execution.error:
print(f"Error: {execution.error.name}: {execution.error.value}")
return None
# Return the standard output (logs/prints)
return execution.logs
# Let's put it together
query = "Calculate the sum of the first 50 prime numbers."
generated_code = generate_python_code(query)
results = execute_in_sandbox(generated_code)
print("\n--- Final Answer ---")
for log in results.stdout:
print(log)
Why this matters: Notice we are printing results.stdout. The LLM wrote code that included print(result). The Sandbox captured that print statement and sent it back to us. We have successfully bridged the gap between text generation and logic execution.
Step 4: Adding Data (The CSV Analyst)
Calculating primes is fun, but the real power is data analysis. We need to create a dummy CSV file, upload it to the sandbox, and have the agent analyze it.
Let's create a scenario: Sales data.
# Create a dummy CSV file locally
csv_content = """Date,Product,Price,Units_Sold
2023-01-01,Widget A,10.00,5
2023-01-02,Widget B,20.00,2
2023-01-03,Widget A,10.00,10
2023-01-04,Widget C,50.00,1
2023-01-05,Widget A,10.00,3
"""
with open("sales_data.csv", "w") as f:
f.write(csv_content)
def analyze_data_file(user_query, file_path):
# 1. Read the file to get column names (so the LLM knows what to write)
# In a real app, we might just pass the header row to the LLM
with open(file_path, "r") as f:
header = f.readline()
# 2. Update prompt to include context about the file
prompt_with_context = f"""
You have a CSV file named '{file_path}' loaded in the environment.
The columns are: {header}
Write pandas code to answer: {user_query}
"""
# 3. Generate Code
print("Generating code...")
code = generate_python_code(prompt_with_context)
# 4. Execute in Sandbox WITH the file
print("Uploading file and executing...")
with Sandbox(api_key=e2b_api_key) as sandbox:
# Upload the file to the cloud sandbox
with open(file_path, "rb") as f:
sandbox.files.write(file_path, f)
execution = sandbox.run_code(code)
if execution.error:
print("Execution failed:", execution.error)
else:
print("\n--- Answer ---")
print(execution.logs.stdout)
# Run the full agent
analyze_data_file("What was the total revenue for Widget A?", "sales_data.csv")
The Result:
The LLM will write pandas code like
df = pd.read_csv('sales_data.csv'), filter for 'Widget A', multiply Price * Units_Sold, and print the sum. The Sandbox runs it, and you get the exact mathematical answer.
Now You Try
You have a working Code Interpreter. Now, extend its capabilities.
The Retry Loop: Sometimes the LLM writes broken code (e.g., a typo in a variable name). Modify analyze_data_file so that if execution.error is true, you send the error message back to the LLM with a prompt saying "You got this error. Please fix the code." and try running it again.
Complex Logic: Ask a question that requires multiple steps, such as: "Calculate the average units sold per day, then multiply that by 100." Verify that the generated code performs both steps.
Library Swap: Modify the prompt to force the LLM to use the numpy library instead of pandas for a calculation. See how the generated code changes.
Challenge Project: The Visualization Agent
Text answers are great, but data analysts produce charts. Your challenge is to make the agent generate a graph.
Requirements:
Ask the agent to "Create a bar chart showing total units sold per product."
The LLM must generate code using matplotlib.
The code must save the chart as a file (e.g., chart.png) inside the sandbox.
Your script must download that file from the sandbox to your local computer.
Example Input:
create_chart("sales_data.csv", "Plot the revenue per day")
Example Output:
A file named
downloaded_chart.png appears on your laptop.
Hints:
* Update the system prompt to tell the LLM: "If asked for a chart, save it as 'output.png'."
* In the E2B Sandbox, after
run_code, look at the documentation for sandbox.files.read().
* You will need to write the bytes you receive from the sandbox into a local file using
open("filename", "wb")`.
What You Learned
Today you unlocked a massive capability boost for your AI applications.
* Separation of Concerns: The LLM is the planner; the Sandbox is the doer.
* Safety: You executed arbitrary code generated by an AI, but you did it in a disposable cloud environment, keeping your local machine safe.
* Context Injection: You learned how to pass file headers to the LLM so it knows how to write code for data it can't physically "see" all at once.
Why This Matters:This pattern is exactly how features like ChatGPT's "Advanced Data Analysis" or Claude's "Artifacts" work. In the real world, this allows agents to interact with internal company databases, perform complex financial modeling, or generate reports without a human writing a single line of SQL or Python.
Tomorrow: We will zoom out and look at Frameworks. You've been coding everything in raw Python, which is great for learning. Tomorrow, we will compare LangChain and LlamaIndex to see how they automate many of the steps you just built manually.