Day 27 of 80

Multi-Provider & Router Patterns

Phase 3: LLM Landscape & APIs

What You'll Build Today

Welcome to Day 27! Today, we are going to build a Smart Router.

Up until now, we have treated Large Language Models (LLMs) like a single tool. You have a prompt, you send it to GPT-4 (or similar), and you get an answer. But in the real world, relying on a single, massive model for everything is a bad strategy. It is like driving a Ferrari to pick up your mail at the end of the driveway—it works, but it is expensive, loud, and unnecessary.

Today, we will build a system that acts as a traffic controller. It will analyze a user's request and decide: "Is this a simple question?" or "Is this a complex reasoning task?" Based on that decision, it will route the query to the most appropriate AI model.

Here is what you will learn and why:

* Router Logic: You will learn how to write code that dynamically chooses which AI model to use. This is needed to balance cost and performance.

* Provider Abstraction: You will learn how to write a single function that can talk to different API styles. This is needed so your main application code doesn't become a mess of specific API calls.

* Latency vs. Cost vs. Quality: You will learn how to weigh trade-offs. This is needed because sometimes you need an answer fast (latency), sometimes you need it cheap (cost), and sometimes you need it perfect (quality).

* Fallback Mechanisms: You will learn how to automatically switch to a backup if your main AI crashes. This is needed to keep your application running 24/7.

Let's dive in.

The Problem

Imagine you have built a customer support chatbot for a pizza shop. You are using the most powerful model available, let's call it "Omni-GPT," which costs $0.03 per request and takes about 3 seconds to reply.

Here is a typical conversation log:

User: "Hello." -> Omni-GPT: "Greetings! How can I assist you today?" (Cost: $0.03, Time: 3s)

User: "Are you open?" -> Omni-GPT: "Yes, we are open until 10 PM." (Cost: $0.03, Time: 3s)

User: "I have a complex allergy issue. I can't eat gluten or dairy, but I love spicy food. Can you design a custom pizza for me based on your current inventory?" -> Omni-GPT: [Complex reasoning...] (Cost: $0.03, Time: 5s)

Do you see the problem?

You just paid premium prices and forced the user to wait 3 seconds just to say "Hello" and "Yes." You are burning money on simple tasks.

Here is what the code usually looks like when beginners start. It is rigid and wasteful.

import time

# Simulating the expensive, powerful model
def call_expensive_model(prompt):
    print(f"Connecting to Expensive Omni-GPT for: '{prompt}'...")
    time.sleep(3) # Simulating network latency and processing
    return "This is a very smart and expensive answer."

# The application logic
user_queries = [
    "Hi there",
    "What is 2 + 2?",
    "Explain the geopolitical implications of the 19th century industrial revolution."
]

for query in user_queries:
    # We treat every query exactly the same
    response = call_expensive_model(query)
    print(f"Response: {response}\n")

The Pain Points:

Cost: You are paying for a PhD-level consultant to answer 2nd-grade math questions.

Latency: The user waits 3 seconds for "Hi there." That feels sluggish.

Vendor Lock-in: If Omni-GPT goes down, your entire app breaks because the code is hardwired to that specific function.

There has to be a way to send the easy stuff to a cheap, fast intern, and only send the hard stuff to the expensive expert.

Let's Build It

We are going to build a Smart Router system. To make this runnable without requiring you to have three different credit cards and API keys for OpenAI, Anthropic, and Groq, we will simulate the "Providers."

The logic and architecture we build here are exactly what you will use with real APIs.

Step 1: Creating the Mock Providers

First, we need to simulate the environment. We will create two "Fake" LLMs.

FastBot: Represents models like Llama-3-8b or GPT-3.5. It is fast (low latency) and cheap, but not great at complex logic.

SmartBot: Represents models like GPT-4 or Claude 3 Opus. It is slow (high latency) and expensive, but very smart.

Copy and run this code:

import time
import random

class MockProvider:
    def __init__(self, name, cost_per_token, avg_latency):
        self.name = name
        self.cost_per_token = cost_per_token
        self.avg_latency = avg_latency

    def generate(self, prompt):
        # Simulate the time it takes to get a response
        print(f"   [{self.name}] Processing request...")
        time.sleep(self.avg_latency)
        
        # Return a mock response based on the provider personality
        if self.name == "FastBot":
            return f"FastBot says: Here is a quick, simple answer to '{prompt}'"
        else:
            return f"SmartBot says: Here is a deeply reasoned, complex answer to '{prompt}'"

# Initialize our providers
fast_llm = MockProvider(name="FastBot", cost_per_token=0.0001, avg_latency=0.5)
smart_llm = MockProvider(name="SmartBot", cost_per_token=0.03, avg_latency=3.0)

print("Providers initialized.")

Why this matters: In a real app, these classes would hold the specific requests.post code or SDK calls (like openai.chat.completions.create) for each specific vendor.

Step 2: The Router Logic (The Classifier)

Now we need a brain. The router needs to look at a prompt and decide: "Is this hard?"

In production, you might actually use a very tiny, cheap LLM to classify the user's intent. For our purpose today, we will write a logical rule-based classifier. We will assume that if a prompt is long or contains specific "trigger words" (like analyze, explain, code), it is complex.

def classify_complexity(prompt):
    # Logic: If the prompt is short and simple, use FastBot.
    # If it's long or asks for complex tasks, use SmartBot.
    
    complex_keywords = ['explain', 'analyze', 'code', 'why', 'difference', 'history']
    
    # Check 1: Is the prompt very short? (Likely a greeting or simple fact)
    if len(prompt.split()) < 5:
        return "SIMPLE"
    
    # Check 2: Does it contain complex keywords?
    prompt_lower = prompt.lower()
    for word in complex_keywords:
        if word in prompt_lower:
            return "COMPLEX"
            
    # Default to simple if no flags are raised
    return "SIMPLE"

# Test the classifier
test_prompts = [
    "Hello there",
    "What is 2+2?",
    "Explain the difference between Python and Java",
    "Code a snake game for me"
]

print("--- Testing Classifier ---")
for p in test_prompts:
    category = classify_complexity(p)
    print(f"Prompt: '{p}' -> Category: {category}")

Why this matters: This function is the traffic cop. By running this logic before calling an API, we save money. A simple Python if statement costs $0.00. An unnecessary GPT-4 call costs real money.

Step 3: The Smart Router Function

Now we combine Step 1 and Step 2. We will write a function that takes the user's prompt, classifies it, and then dispatches it to the correct provider.

def smart_router(prompt):
    print(f"\nIncoming Query: '{prompt}'")
    
    # 1. Classify the intent
    complexity = classify_complexity(prompt)
    print(f"   Analysis: This query is {complexity}")
    
    # 2. Route to the appropriate model
    if complexity == "SIMPLE":
        print("   Routing to: FastBot (Save money!)")
        response = fast_llm.generate(prompt)
    else:
        print("   Routing to: SmartBot (Need brain power!)")
        response = smart_llm.generate(prompt)
        
    return response

# Let's run it!
print("--- Starting Smart Router ---")
response_1 = smart_router("Hi")
print(f"FINAL OUTPUT: {response_1}")

response_2 = smart_router("Analyze the economic impact of AI")
print(f"FINAL OUTPUT: {response_2}")

Output Analysis:

When you run this, notice the speed difference. "Hi" should return almost instantly via FastBot. The "Analyze" query will pause for 3 seconds (simulated) before returning via SmartBot. You have successfully optimized for both latency and quality!

Step 4: Adding Cost Tracking

To prove the value of this system, let's track how much money we "spent."

total_cost = 0.0

def smart_router_with_billing(prompt):
    global total_cost
    
    complexity = classify_complexity(prompt)
    
    if complexity == "SIMPLE":
        selected_model = fast_llm
    else:
        selected_model = smart_llm
        
    # Simulate generating response
    response = selected_model.generate(prompt)
    
    # Calculate simulated cost (assuming 1 token per word for simplicity)
    # In reality, you'd count actual tokens used
    tokens_used = len(prompt.split()) + len(response.split())
    transaction_cost = tokens_used * selected_model.cost_per_token
    
    total_cost += transaction_cost
    
    print(f"   [BILLING] This call cost: ${transaction_cost:.5f}")
    return response

# Run a batch
queries = [
    "Hi", 
    "What time is it?", 
    "Explain quantum physics in detail",
    "Bye"
]

print("\n--- Processing Batch ---")
for q in queries:
    smart_router_with_billing(q)

print(f"\nTotal Session Cost: ${total_cost:.5f}")

Why this matters: If we had sent all 4 queries to SmartBot, the cost would be roughly 300x higher. In a startup processing millions of requests, this logic is the difference between profitability and bankruptcy.

Now You Try

You have a working router. Now, let's make it robust.

The "Mid-Tier" Model: Add a third provider called StandardBot (Cost: 0.005, Latency: 1.5s). Update the classify_complexity function to return "MEDIUM" for prompts that are between 10 and 20 words long. Update the router to handle this third case.

The VIP User: Modify smart_router to accept a user_tier argument. If user_tier == "premium", always route them to SmartBot, regardless of complexity. Premium users pay more, so they get the best model every time.

The Timeout Handler: Sometimes models hang. Add a logic check: if the chosen model is SmartBot, print a warning message saying "This might take a while..." before calling it.

Challenge Project: The Automatic Fallback

In production, APIs fail. OpenAI might have an outage, or your credit card might be declined. A robust system doesn't crash; it falls back to a backup.

The Goal: Build a function reliable_generation(prompt) that tries the primary model, and if it fails, automatically calls the backup model. Requirements:

Modify MockProvider to accept a failure_rate argument (e.g., 0.5 means it fails 50% of the time).

Update the generate method to randomly raise an Exception("API Error") based on that failure rate.

Create a Primary model (SmartBot) that is unstable (50% failure rate) and a Backup model (FastBot) that is stable (0% failure rate).

Write the reliable_generation function using try and except blocks.

If the Primary fails, catch the error, print "Primary failed, switching to backup...", and call the Backup model.

Example Output:

Attempting generation with Primary...
ERROR: Primary API Connection Failed.
Switching to Backup Provider...
Success: Backup Provider returned response.

Hint:

Your Python structure will look like this:

try:
    # Try the risky, high-quality thing
    return primary_model.generate(prompt)
except Exception as e:
    # If it blows up, handle it here
    print(f"Error: {e}")
    return backup_model.generate(prompt)

What You Learned

Today you moved from being a "Prompt Sender" to a "System Architect." You learned:

* Router Pattern: How to dynamically select tools based on the job at hand.

* Cost/Latency Trade-offs: Recognizing that the "best" model isn't always the right choice for every task.

* Abstraction: Hiding the messy details of specific models behind a clean function.

* Reliability: (If you did the challenge) How to ensure your app survives when an API goes down.

Why This Matters:

When you build real applications, you won't just be pasting prompts into ChatGPT. You will be building systems that orchestrate multiple models, databases, and tools. The logic you wrote today—routing traffic based on complexity—is exactly how enterprise AI products like Copilot and ChatGPT (the product) work under the hood.

Tomorrow: We dive into the most magical concept in modern NLP: Embeddings. This is the foundation of Search, Memory, and RAG (Retrieval Augmented Generation). Get ready to turn text into numbers!

← Day 26 Day 28 →