Day 62 of 80

OpenAI Fine-Tuning

Phase 7: Advanced Techniques

Here is the comprehensive content for Day 62 of the GenAI Bootcamp.

What You'll Build Today

Welcome to Day 62. Today is a major milestone. Up until now, you have been an "operator" of AI—sending instructions to a general-purpose model. Today, you become an "architect." You are going to create your own version of GPT-4o-mini.

Specifically, you will build a "Corporate Translator" Model. This model will take complex, annoying corporate jargon (like "leverage synergies" or "circle back") and translate it into blunt, simple, plain English.

While prompt engineering can do this to an extent, you will find that fine-tuning "locks in" the behavior much more reliably and allows you to remove the instructions entirely.

Here is what you will master today:

* Data Formatting (JSONL): You will learn why AI training data looks different from standard data and how to structure it perfectly.

* The Fine-Tuning Lifecycle: You will move through the four stages: Prepare, Upload, Train, and Evaluate.

* Hyperparameters: You will learn what an "epoch" is and why training the model too many times makes it memorize rather than learn.

* Model Evaluation: You will verify if your custom model is actually better than the standard one.

Let's get started.

The Problem

Imagine you are building a chat application that translates corporate speak into plain English. You want it to be very concise. You do not want it to be polite. You just want the facts.

If you use standard GPT-4o-mini, you have to write a very detailed system prompt to force it to behave this way.

Here is how you might try to do this currently:

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

jargon_text = "We need to drill down on the low-hanging fruit to leverage our core competencies."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system", 
            "content": (
                "You are a blunt translator. You take corporate jargon and rewrite it in "
                "the simplest possible English. Do not be polite. Do not add explanations. "
                "Do not say 'Here is the translation'. Just give the translation. "
                "Keep it under 10 words if possible."
            )
        },
        {"role": "user", "content": jargon_text}
    ]
)

print(response.choices[0].message.content)

The Pain Points:

Inconsistency: Sometimes the model ignores "Do not be polite" and adds "Sure! Here is the translation."

Token Waste: You are sending that long system prompt (about 60 tokens) with every single request. If you process 1 million messages, you are paying for that system prompt 1 million times.

Latency: Processing the system prompt takes time.

Drift: If OpenAI updates the base model, your specific prompt might stop working as effectively.

There has to be a way to "save" that system prompt directly into the model's brain, so we don't have to repeat it, and so the model intuitively "knows" exactly how to speak. That way is fine-tuning.

Let's Build It

We are going to fine-tune gpt-4o-mini. This process involves creating a dataset, formatting it as JSONL, uploading it, and starting a training job.

Step 1: Prepare the Data

Fine-tuning requires examples. We need a list of "Input" (the jargon) and the ideal "Output" (the plain English). In the real world, you might have thousands of these. For this tutorial, we will use a small set to demonstrate the mechanics.

Create a file named generate_data.py.

import json

# This is our training data. 
# It represents the ideal interaction between user and assistant.
training_data = [
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "We need to leverage our synergies to optimize workflow."},
            {"role": "assistant", "content": "We need to work together to work faster."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "Let's circle back on this action item offline."},
            {"role": "assistant", "content": "Let's talk about this later in private."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "We need to drill down into the granular details."},
            {"role": "assistant", "content": "We need to look at the specifics."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "This is a low-hanging fruit opportunity."},
            {"role": "assistant", "content": "This is an easy win."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "Please reach out to the stakeholders for alignment."},
            {"role": "assistant", "content": "Ask the people involved if they agree."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "We need to think outside the box to shift the paradigm."},
            {"role": "assistant", "content": "We need creative ideas to change how we do things."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "I don't have the bandwidth to take this on right now."},
            {"role": "assistant", "content": "I am too busy for this."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "Let's touch base regarding the deliverables."},
            {"role": "assistant", "content": "Let's talk about the work due."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "We need to incentivize the team to go the extra mile."},
            {"role": "assistant", "content": "We need to reward the team for working harder."}
        ]
    },
    {
        "messages": [
            {"role": "system", "content": "Translate corporate jargon to plain English."},
            {"role": "user", "content": "Ensure we are all singing from the same hymn sheet."},
            {"role": "assistant", "content": "Make sure we all agree."}
        ]
    }
]

# Save as JSONL (JSON Lines)
# Each line is a valid JSON object, but the whole file is NOT a JSON array.
file_name = "corporate_jargon_data.jsonl"

with open(file_name, 'w') as f:
    for entry in training_data:
        json.dump(entry, f)
        f.write('\n') # New line for the next entry

print(f"Successfully created {file_name} with {len(training_data)} examples.")

Run this script. It creates a file called corporate_jargon_data.jsonl. Open that file in a text editor. Notice how there are no commas at the end of lines and no square brackets [] wrapping the whole file. This is the JSONL format required by OpenAI.

Step 2: Upload the File

Now we need to send this file to OpenAI's servers so they can access it for training.

Create a file named upload_file.py.

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

file_name = "corporate_jargon_data.jsonl"

print("Uploading file to OpenAI...")

try:
    response = client.files.create(
        file=open(file_name, "rb"),
        purpose="fine-tune"
    )
    
    print(f"File uploaded successfully!")
    print(f"File ID: {response.id}")
    
    # Save the File ID to a text file so we can use it in the next step
    with open("file_id.txt", "w") as f:
        f.write(response.id)
        
except Exception as e:
    print(f"Error uploading file: {e}")

Run this. It will output a File ID that looks like file-XyZ123.... We saved this ID to a text file because we need it immediately for the next step.

Step 3: Create the Fine-Tuning Job

This is where the magic happens. We tell OpenAI: "Take the model gpt-4o-mini, look at the file I just uploaded, and adjust the model's weights to match those patterns."

We will set hyperparameters. Specifically, n_epochs.

* Epoch: One full pass through your dataset.

* If we define 3 epochs, the model reads your data 3 times.

* Too few epochs? It won't learn the pattern.

* Too many epochs? It will memorize the exact sentences and fail on new ones (overfitting). For small datasets, 3 to 4 is usually a good starting point.

Create start_training.py.

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Read the file ID we saved earlier
with open("file_id.txt", "r") as f:
    training_file_id = f.read().strip()

print(f"Starting fine-tuning job with File ID: {training_file_id}")

try:
    job = client.fine_tuning.jobs.create(
        training_file=training_file_id,
        model="gpt-4o-mini-2024-07-18", # Always specify the specific date version
        hyperparameters={
          "n_epochs": 4
        }
    )
    
    print("Job created successfully!")
    print(f"Job ID: {job.id}")
    print(f"Status: {job.status}")
    
    # Save Job ID for tracking
    with open("job_id.txt", "w") as f:
        f.write(job.id)

except Exception as e:
    print(f"Error creating job: {e}")

Run this. You will get a Job ID (e.g., ftjob-ABC123...).

Crucial Note: The job has started, but it is not finished. It is running on OpenAI's servers. It might take 5 minutes, or it might take 30 minutes depending on server load.

Step 4: Monitor the Job

You cannot use the model until the job succeeds. Let's write a script to check the status.

Create check_status.py.

from openai import OpenAI
import os
import time

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

with open("job_id.txt", "r") as f:
    job_id = f.read().strip()

print(f"Checking status for Job ID: {job_id}")

while True:
    job = client.fine_tuning.jobs.retrieve(job_id)
    status = job.status
    print(f"Current Status: {status}")
    
    if status == "succeeded":
        print("\nJob Complete!")
        print(f"Fine-tuned Model Name: {job.fine_tuned_model}")
        
        # Save the new model name
        with open("model_name.txt", "w") as f:
            f.write(job.fine_tuned_model)
        break
        
    elif status == "failed":
        print("Job Failed.")
        print(job.error)
        break
        
    elif status == "cancelled":
        print("Job Cancelled.")
        break
        
    # Wait 30 seconds before checking again
    time.sleep(30)

Run this script. It will loop every 30 seconds. Go grab a coffee. When it finishes, it will print succeeded and save your new model name (which looks like ft:gpt-4o-mini-0613:org-id::12345).

Step 5: Use Your New Model

Once the previous script finishes successfully, you own a custom model. Let's test it on a phrase that was not in the training data to see if it generalized the "attitude."

Create test_model.py.

from openai import OpenAI
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Load your custom model name
with open("model_name.txt", "r") as f:
    custom_model = f.read().strip()

print(f"Testing model: {custom_model}")

# New jargon NOT in the training set
test_input = "We need to peel the onion on these metrics to find the value-add."

response = client.chat.completions.create(
    model=custom_model,
    messages=[
        # Note: We can use a much shorter system prompt now, 
        # or even the same one used in training.
        {"role": "system", "content": "Translate corporate jargon to plain English."},
        {"role": "user", "content": test_input}
    ]
)

print("\nInput:", test_input)
print("Result:", response.choices[0].message.content)

Expected Output:

If it worked, the output should be something like: "We need to analyze the data to find the benefit."

It should be short, direct, and contain no "Sure, here is the translation" fluff.

Now You Try

You have the pipeline set up. Now, try these extensions to deepen your understanding:

The "Pirate" Pivot:

Modify generate_data.py to change the assistant responses to sound like a pirate. Keep the inputs the same. Create a new file pirate_data.jsonl, upload it, and train a new model. This proves you can change the style of the model completely with just 10 examples.

Validation Script:

Before uploading, write a Python script that reads your .jsonl file line by line and checks if json.loads(line) works. This is a common safety step in production to prevent a 1-hour upload from failing at 99% due to a missing comma.

The "System Prompt" Experiment:

In test_model.py, try changing the system prompt to something completely different, like "You are a helpful poet." See if the fine-tuned model ignores your new instructions and sticks to the "Corporate Translator" persona. (Spoiler: Fine-tuned models are often stubborn and resist prompt changes—this is a feature, not a bug!)

Challenge Project: The Model Face-Off

Your challenge is to quantitatively measure if your fine-tuning actually helped. You will compare the Base Model vs. Your Fine-Tuned Model.

Requirements:

Create a list of 5 new corporate jargon phrases (test cases) that were not in your training data.

Write a script that loops through these 5 phrases.

For each phrase, generate a response from gpt-4o-mini (Base) and your ft:gpt-4o-mini... (Fine-Tuned).

Print them side-by-side.

Hard Mode: Calculate the "Word Count Reduction." Since we wanted the model to be concise, calculate the average number of words in the Base response vs. the Fine-Tuned response.

Example Output:

Input: "Let's take this offline to leverage bandwidth."

Base Model: "Sure, I can help with that. Translation: Let's discuss this privately to save time." (15 words)
Fine-Tuned: "Let's talk privately to save time." (6 words)

Improvement: 60% reduction in verbosity.

Hint:

You will need to make two separate API calls inside your loop. Store the results in a list or dictionary to calculate the averages at the end.

What You Learned

Today you moved from prompt engineering to model engineering.

* JSONL Format: You learned that training data must be line-separated JSON objects, simulating the conversation history.

* The Pipeline: You experienced the real-world flow: Data Gen -> Upload -> Train -> Wait -> Test.

* Overfitting vs. Learning: You learned that epochs control how hard the model studies your data.

* Behavior Locking: You saw that fine-tuning changes the default behavior of the model, allowing you to use shorter prompts and get more consistent results.

Why This Matters:

In production, you might fine-tune a model to output JSON specifically for your database schema, or to write code in your company's proprietary language. Fine-tuning is the bridge between a generic genius and a specialized expert.

Tomorrow:

Now that you have customized a model in the cloud, what if you want to run AI entirely on your own laptop, for free, without internet? Tomorrow, we dive into Local LLMs.

← Day 61 Day 63 →