Day 21 of 80

The LLM Landscape Overview

Phase 3: LLM Landscape & APIs

Day 21: The LLM Landscape Overview

Welcome to Phase 3. You have survived Python basics. You have conquered data structures. Now, we enter the world of Large Language Models (LLMs).

There is a lot of hype surrounding AI. People talk about "sparks of consciousness" or "superintelligence." We are going to ignore all of that. As a developer, you need to strip away the magic.

To you, an LLM is not a digital brain. It is not a sci-fi character. It is a function. You put text in, and you get text out. That is it. Today, we are going to learn exactly how that function works, who provides it, how much it costs, and how to choose the right one for your software.

What You'll Build Today

We are going to build a Model Recommendation Engine.

Since we aren't calling the actual AI APIs until tomorrow, today we will use your Python skills to build a program that understands the "specs" of different AI models (like GPT-4, Claude 3.5 Sonnet, and Gemini). Your program will take a user's requirements (budget, input size) and recommend the correct model.

Here is what you will learn:

* The API Mental Model: Why we treat AI as a function call (text_in -> text_out).

* The Major Players: Understanding the differences between OpenAI, Anthropic, Google, and Meta.

* Token Economics: Why we count "tokens" instead of words, and how pricing actually works.

* Context Windows: Understanding the memory limits of these models.

* Open vs. Closed Source: The trade-off between convenience and control.

Let's demystify the magic.

The Problem

Before LLMs arrived, if you wanted to build a chatbot or a program that understood English, you had to write code that looked like this.

Read this code carefully. Imagine you are trying to build a customer service bot for a pizza shop.

def old_school_chatbot(user_input):
    # Normalize input to lowercase to try and catch variations

text = user_input.lower()

if "hello" in text:

return "Welcome to Python Pizza! How can I help?"

elif "order" in text:

if "pizza" in text:

return "What toppings would you like?"

elif "drink" in text:

return "We have Coke and Pepsi."

else:

return "Order what exactly?"

elif "price" in text or "cost" in text:

return "Pizzas are $15."

elif "bye" in text:

return "Goodbye!"

else:

return "I do not understand. Please say 'order' or 'price'."

# Let's try to use it

print(old_school_chatbot("Hello"))

print(old_school_chatbot("I want to buy a pie"))

The Output:
Welcome to Python Pizza! How can I help?

I do not understand. Please say 'order' or 'price'.

The Pain:
  • It is brittle: The user said "buy a pie" instead of "order a pizza." The code failed because we didn't explicitly check for the word "pie."
  • It is endless: To make this good, you would need millions of if statements covering every possible way a human can speak.
  • No context: If the user says "I'll take two," the bot has no idea what they are talking about because it doesn't remember the previous sentence.
  • This approach is impossible to scale. You cannot program English using if statements.

    We need a function that accepts any string of text and returns a logical, human-like response, without us writing the rules for grammar or vocabulary.

    Let's Build It

    We are going to model the solution. While we won't call the OpenAI API today (we need to set up keys for that tomorrow), we are going to build the logic that helps us manage these "magic functions."

    Step 1: The Mental Shift (Pseudocode)

    First, we need to change how you view these models. In Phase 1, you learned the add function.

    def add(a, b):
    

    return a + b

    An LLM is the exact same concept. It lives on a server (like OpenAI's), but conceptually, it is just this:

    def llm(prompt):
        # Complex math happens inside the server...
    

    return completion

    You send a string (the prompt). You get a string back (the completion).

    Let's define our "Landscape" using a Python dictionary. This will serve as our database of available "functions."

    # A database of the current top models
    # We track: Provider, Cost per 1M tokens (roughly), and Context Window
    

    llm_landscape = {

    "gpt-4o": {

    "provider": "OpenAI",

    "role": "Flagship",

    "input_cost_per_1m": 5.00, # $5.00 per 1 million tokens

    "context_window": 128000

    },

    "claude-3-5-sonnet": {

    "provider": "Anthropic",

    "role": "Flagship",

    "input_cost_per_1m": 3.00,

    "context_window": 200000

    },

    "gpt-4o-mini": {

    "provider": "OpenAI",

    "role": "Budget",

    "input_cost_per_1m": 0.15,

    "context_window": 128000

    },

    "llama-3-70b": {

    "provider": "Meta (Open Source)",

    "role": "Open Weight",

    "input_cost_per_1m": 0.0, # Free if you run it yourself

    "context_window": 8192

    }

    }

    print(f"Loaded {len(llm_landscape)} models into the registry.")

    Step 2: Understanding Tokens

    When you pay for these models, you don't pay by the request or by the word. You pay by the Token.

    A token is a chunk of text.

    * Rough rule of thumb: 1000 tokens ≈ 750 words.

    * Or: 1 word ≈ 1.3 tokens.

    Let's write a helper function to estimate token counts so we can predict costs.

    def estimate_tokens(text):
        # Split text into words (rough approximation)
    

    words = text.split()

    word_count = len(words)

    # Apply the 1.3 multiplier rule

    estimated_tokens = int(word_count * 1.3)

    return estimated_tokens

    # Test it out

    user_prompt = "I want to analyze a very long legal contract about software licensing."

    tokens = estimate_tokens(user_prompt)

    print(f"Prompt: '{user_prompt}'")

    print(f"Estimated Tokens: {tokens}")

    Step 3: Calculating Costs

    Now, let's combine our data structure with our token estimator. This is critical. If you accidentally send a whole book to GPT-4o, it might cost you $0.50. If you send it to GPT-4o-mini, it might cost $0.01. As a developer, you must manage this.

    def calculate_cost(model_name, text_input):
        # 1. Get the model specs
    

    if model_name not in llm_landscape:

    return "Model not found"

    model = llm_landscape[model_name]

    # 2. Estimate tokens

    tokens = estimate_tokens(text_input)

    # 3. Calculate cost # Price is per 1 Million tokens, so we divide by 1,000,000

    cost = (tokens / 1_000_000) * model['input_cost_per_1m']

    return cost

    # Let's say we have a massive document

    massive_document = "word " * 10000 # A string with 10,000 words

    cost_flagship = calculate_cost("gpt-4o", massive_document)

    cost_budget = calculate_cost("gpt-4o-mini", massive_document)

    print(f"Cost to process with GPT-4o: ${cost_flagship:.5f}")

    print(f"Cost to process with GPT-4o-mini: ${cost_budget:.5f}")

    Output:
    Cost to process with GPT-4o: $0.06500
    

    Cost to process with GPT-4o-mini: $0.00195

    Note: The budget model is nearly 30x cheaper! This is why knowing the landscape matters.

    Step 4: The Context Window (Memory)

    The Context Window is the limit of how much text the model can "hold" in its mind at one time. If a model has a context window of 8,000 tokens, and you try to feed it a 10,000-token book, it will crash (or cut off the text).

    Let's write our logic to filter out models that are too small for a task.

    def recommend_model(text_input, max_budget_usd):
    

    tokens = estimate_tokens(text_input)

    print(f"Input requires ~{tokens} tokens.")

    valid_models = []

    for name, specs in llm_landscape.items():

    # Check 1: Does it fit in the context window?

    if tokens > specs['context_window']:

    print(f"X {name}: Context window too small")

    continue

    # Check 2: Is it within budget?

    cost = (tokens / 1_000_000) * specs['input_cost_per_1m']

    if cost > max_budget_usd:

    print(f"X {name}: Too expensive (${cost:.5f})")

    continue

    valid_models.append(name)

    return valid_models

    # Test Case: A huge input (15,000 words) with a tiny budget ($0.01)

    huge_input = "word " * 15000

    recommendations = recommend_model(huge_input, 0.01)

    print("\nRecommended Models:", recommendations)

    Output:
    Input requires ~19500 tokens.
    

    X llama-3-70b: Context window too small

    X gpt-4o: Too expensive ($0.09750)

    X claude-3-5-sonnet: Too expensive ($0.05850)

    Recommended Models: ['gpt-4o-mini']

    Step 5: Putting it Together

    We have built a system that understands the technical constraints of LLMs. This is the foundation of being an AI Engineer. You don't just "use AI." You pick the right tool for the job based on cost, capability, and memory limits.

    Now You Try

    You have the base logic. Now expand the "Landscape."

  • Add Google Gemini: Update the llm_landscape dictionary to include gemini-1.5-pro. It has a massive context window of 2,000,000 tokens (yes, two million) and costs roughly $3.50 per 1M tokens.
  • Add Output Costs: Currently, we only calculated the cost of input (the prompt). Models also charge for output (what they write back), and it's usually more expensive. Add an output_cost_per_1m key to every model in the dictionary.
  • The "Smart" Toggle: Create a new function called get_smartest_model(). It should ignore cost and just return the model with the highest price (assuming price correlates with intelligence/capability).
  • Challenge Project

    In this challenge, you will write Pseudocode.

    Pseudocode is code that doesn't actually run, but explains the logic for humans to read. We want to compare the "Old Way" (Day 7 functions) with the "New Way" (LLM API calls).

    Requirements:
  • Define a standard Python function calculate_tax(amount) that uses math.
  • Define a pseudocode function call_llm(prompt, model_name) that represents how we will use AI tomorrow.
  • Show how you would use call_llm to perform the same tax calculation task by asking the model in plain English.
  • Print the imaginary result.
  • Example Input/Output format:
    # 1. The Old Way (Deterministic)
    

    def calculate_tax(amount):

    return amount * 0.2

    # 2. The New Way (Probabilistic AI)

    def call_llm(prompt, model):

    # This represents the magic API call we learn tomorrow

    print(f"Sending to {model}...")

    return "Simulated AI Response"

    # 3. Usage

    prompt = "Calculate tax on $100 assuming a 20% rate. Return only the number."

    result = call_llm(prompt, "gpt-4o")

    Hints:

    * Remember, the LLM takes a string and returns a string.

    * Even if you want a number back, the LLM gives you text (e.g., "20"). You might need to cast it to an int or float later.

    What You Learned

    Today was about the Mental Model. You didn't just learn about AI; you learned how to organize AI models as data.

    * LLMs are Functions: Text In -> Text Out.

    * The Landscape: OpenAI (GPT), Anthropic (Claude), Google (Gemini), Meta (Llama).

    * Tokens: The currency of AI (approx 0.75 words).

    * Context Window: The memory limit of the model.

    Why This Matters:

    Tomorrow, we stop simulating. You will get your API Key from OpenAI. You will install the Python library. You will write a script that sends text to a server in California and gets an intelligent response back.

    If you didn't understand tokens and costs today, you might accidentally spend $20 in 5 minutes tomorrow. But now, you are ready.

    Tomorrow: OpenAI API - Your first real AI function call.