Day 34 of 80

Dynamic Templates & Evaluation

Phase 4: Prompt Engineering

What You'll Build Today

Welcome to Day 34! Today involves a major level-up in how you engineer prompts. Up until now, we have been treating prompts like simple text messages—typing them out one by one. But what happens when you need to generate 1,000 unique emails based on a spreadsheet of customer data? Or when you want to change the style of your prompt based on a user's specific request without rewriting code?

Today, we are building a Dynamic Cold Email Generator. You will feed it raw data (names, companies, pain points), and it will intelligently construct a custom prompt for an LLM to write the perfect email.

Here is what you will master today:

* Jinja2 Templating: You will learn why string concatenation (adding text together with +) is a trap, and how to use professional templating engines to inject variables into prompts cleanly.

Logic Control in Prompts: You will learn how to use if/else statements inside* your text templates to drastically change the instructions sent to the AI based on data.

* LLM-as-a-Judge: You will learn the industry-standard way to test if your prompts are working: using a smarter AI (like GPT-4) to grade the homework of a faster AI (like GPT-3.5).

* Prompt Scalability: You will move from "playing with ChatGPT" to "building a system that can handle thousands of requests."

Let's get started.

The Problem

Imagine you are building a tool to help a sales team write outreach emails. You have a list of prospects. Some are CEOs (who need short, punchy emails), and some are Engineers (who need technical details).

If you try to do this with basic Python strings, your code quickly turns into a nightmare.

Look at this attempt using standard Python f-strings:

# The "Painful" Way
role = "CEO"
industry = "Tech"
name = "Sarah"
pain_point = "slow deployment"

# Trying to build a prompt with logic inside Python strings
prompt = f"Write an email to {name}."

if role == "CEO":
    prompt += " Keep it under 50 words because they are busy."
else:
    prompt += " You can write up to 200 words."

if industry == "Tech":
    prompt += " Mention our API speed."
elif industry == "Healthcare":
    prompt += " Mention our HIPAA compliance."

prompt += f" Focus on solving {pain_point}."

print(prompt)

Why this is painful:

Hard to Read: You have to read the Python logic to understand what the final text will look like. The "prompt" is scattered across ten lines of code.

Whitespace Issues: Notice the spaces inside the quotes (" Keep it..."). If you forget one space, your sentences run together.

Hard to Edit: If you want to change the tone of the email, you have to dig through Python if statements rather than just editing a text file.

No Separation of Concerns: Your text content is mixed directly with your programming logic.

There is a better way. We can separate the structure of the text from the data that fills it. This is called Templating.

Let's Build It

We are going to use a library called Jinja2. It is the industry standard for Python templating. It allows you to write a text file with placeholders, and then "render" it with data.

Step 1: Setup and Installation

First, we need to install the Jinja2 library (and openai if you haven't already).

``bash


pip install jinja2 openai

Now, create a new file called email_generator.py. We will start by importing our libraries.



import os
from jinja2 import Template
from openai import OpenAI

# Initialize the client (ensure your API key is set in environment variables)
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

print("Libraries loaded successfully.")


Step 2: Your First Template

In Jinja2, we use double curly braces {{ variable_name }} to insert data.



Let's create a template string. Notice how clean it looks compared to the concatenation method in "The Problem."

# Define the template string
# We use triple quotes (""") for multi-line strings in Python
email_template_str = """
You are a world-class sales copywriter.
Write a cold email to {{ name }}, who is the {{ role }} at {{ company }}.

The goal of this email is to schedule a demo of our product, 'SpeedyAI'.
Focus heavily on solving their main problem: {{ pain_point }}.

Keep the tone professional but friendly.
"""

# Create a Jinja2 Template object
template = Template(email_template_str)

# Define our data (the context)
data = {
    "name": "Marcus",
    "role": "CTO",
    "company": "TechFlow",
    "pain_point": "high cloud infrastructure costs"
}

# Render the template
# This fills in the slots with the data
final_prompt = template.render(data)

print("--- GENERATED PROMPT ---")
print(final_prompt)


Why this matters: You can read

email_template_str

 like a normal paragraph. You can see exactly what the AI will see.

Step 3: Adding Logic to Templates

This is where Jinja2 shines. We can put logic inside the text using {% %} tags. This allows us to give different instructions to the LLM based on the data.



Let's update our template to handle different roles differently.

# A more advanced template with logic
advanced_template_str = """
You are a sales assistant. Write an email to {{ name }} at {{ company }}.

{% if role == "CEO" %}
CRITICAL INSTRUCTION: The recipient is a CEO. 

Keep the email extremely short (under 75 words).
Focus on ROI and bottom-line value.
Do not use technical jargon.

{% else %}
The recipient is a technical leader.

You can explain the technical implementation details.
Feel free to use industry terminology.
Length can be around 150 words.

{% endif %}

Problem to solve: {{ pain_point }}.
Sign off as: The SpeedyAI Team.
"""

template = Template(advanced_template_str)

# Test 1: CEO
ceo_data = {
    "name": "Sarah",
    "company": "Innovate Inc",
    "role": "CEO",
    "pain_point": "slow time-to-market"
}

# Test 2: Engineer
dev_data = {
    "name": "Mike",
    "company": "DevCorp",
    "role": "Senior Engineer",
    "pain_point": "messy documentation"
}

print("--- CEO PROMPT ---")
print(template.render(ceo_data))

print("\n--- ENGINEER PROMPT ---")
print(template.render(dev_data))

Run this code. Notice how the "CEO Prompt" contains instructions about ROI and brevity, while the "Engineer Prompt" allows for technical details. We didn't have to write any Python if statements to construct the string—the logic lives in the template.



Step 4: Connecting to the LLM

Now that we have a perfect prompt generator, let's feed it to the LLM to actually get our email. We will create a function to handle this.

def generate_email(person_data):
    # 1. Define the Template
    prompt_template = """
    Write a cold outreach email to {{ name }}, the {{ role }} at {{ company }}.
    
    {% if role == "CEO" %}
    tone: Direct and value-focused. Max 3 sentences.
    {% else %}
    tone: Helpful and consultative. Can be longer.
    {% endif %}
    
    Pain point: {{ pain_point }}
    Product: SpeedyAI (automates code documentation).
    """
    
    # 2. Render the Prompt
    template = Template(prompt_template)
    filled_prompt = template.render(person_data)
    
    # 3. Call the LLM
    response = client.chat.completions.create(
        model="gpt-3.5-turbo", # Using 3.5 for speed/cost
        messages=[
            {"role": "system", "content": "You are a helpful sales assistant."},
            {"role": "user", "content": filled_prompt}
        ],
        temperature=0.7
    )
    
    return response.choices[0].message.content

# Let's try it out
prospect = {
    "name": "Diana",
    "role": "CEO",
    "company": "Marketing Wizards",
    "pain_point": "spending too much time on manual reporting"
}

email_draft = generate_email(prospect)

print(f"--- EMAIL FOR {prospect['name']} ---")
print(email_draft)


Step 5: Introduction to Evaluation (LLM-as-a-Judge)

We have a generator. But is it good?

In traditional software, we write "unit tests" (e.g., assert 2 + 2 == 4).


In AI, we can't easily assert that an email is "good."

However, we can use an LLM to grade the output. This is called "LLM-as-a-Judge." We will ask GPT-4 (or a strong model) to score the email generated by GPT-3.5.

def grade_email(original_prompt, generated_email):
    evaluation_prompt = f"""
    You are a strict editor. I will provide you with an Email Draft.
    
    Please grade the email on a scale of 1 to 5 based on these criteria:
    1. Is it polite?
    2. Does it address the specific pain point mentioned?
    3. Is the length appropriate?
    
    Email Draft:
    "{generated_email}"
    
    Return ONLY the number (1-5).
    """
    
    response = client.chat.completions.create(
        model="gpt-4o", # Ideally use a stronger model for grading
        messages=[{"role": "user", "content": evaluation_prompt}]
    )
    
    return response.choices[0].message.content

# Run the grader
score = grade_email(prospect, email_draft)
print(f"\nQuality Score: {score}/5")


Why this matters: You now have a feedback loop. You can change your template, run the generator, run the grader, and see if your score goes up or down.

Now You Try

You have the basic engine. Now, let's make it robust.

 The Bulk Generator: Create a list of dictionaries containing 3 different prospects (different roles/companies). Write a

for loop that iterates through this list, generates an email for each, and prints them separated by a divider line.

The Tone Switcher: Add a new variable to your Jinja template called style. If style is "aggressive", the prompt should say "Use strong calls to action." If style is "soft", it should say "Be passive and invite discussion." Pass this variable in from your Python dictionary.

 The File Saver: Instead of printing to the console, modify your script to save every generated email to a text file named

email_to_{name}.txt.



Challenge Project: The Automated Grader

Your challenge is to build a script that objectively measures the accuracy of an LLM on a specific task.

Scenario: You have a dataset of 5 trivia questions and their correct answers. You want to see if GPT-3.5 knows the answers.

Requirements:

Create a list of dictionaries. Each item should have a question and a correct_answer.


       Example:*

{"question": "What is the capital of France?", "correct_answer": "Paris"}


 Loop through the questions. For each one, ask GPT-3.5 for the answer.
 The Judge: For each answer GPT-3.5 gives, send a prompt to GPT-4 (or GPT-3.5 if you want to save credits, but instruct it carefully).
    *   The Judge prompt should look like: "Question: [Q]. Correct Answer: [A]. Student Answer: [Student Answer]. Is the student correct? Answer YES or NO."
 Count how many "YES" responses you get.
 Print the final accuracy percentage (e.g., "Accuracy: 80%").

Hint:

Your "Judge" prompt needs to be very robust. Sometimes the model might say "Paris is the capital." and the correct answer is just "Paris". A simple string match (==) in Python would fail, but an LLM Judge will understand that they are the same thing.



What You Learned

Today you moved from "hardcoding" to "engineering."

* Jinja2 Templates: You learned to separate your prompt logic ({% if %}) from your Python application logic. This is how large-scale AI apps are built.

* Variable Injection: You learned to dynamically inject data into prompts using {{ }}`.

* LLM-as-a-Judge: You learned that the best way to evaluate an AI is often with another AI.

Why This Matters:

In the real world, you never write prompts for one-off tasks. you write prompts that process thousands of user inputs per day. Templates allow you to maintain that code. Evaluation scripts (like the Challenge Project) allow you to confidently update your prompts, knowing you can mathematically prove the new prompt is better than the old one.

Tomorrow: We tackle the missing link in our chatbots: Memory. You will learn how to build multi-turn conversations where the AI remembers what you said five minutes ago.

← Day 33 Day 35 →