Multi-Provider & Router Patterns
What You'll Build Today
Welcome to Day 27! Today, we are going to build a Smart Router.
Up until now, we have treated Large Language Models (LLMs) like a single tool. You have a prompt, you send it to GPT-4 (or similar), and you get an answer. But in the real world, relying on a single, massive model for everything is a bad strategy. It is like driving a Ferrari to pick up your mail at the end of the driveway—it works, but it is expensive, loud, and unnecessary.
Today, we will build a system that acts as a traffic controller. It will analyze a user's request and decide: "Is this a simple question?" or "Is this a complex reasoning task?" Based on that decision, it will route the query to the most appropriate AI model.
Here is what you will learn and why:
* Router Logic: You will learn how to write code that dynamically chooses which AI model to use. This is needed to balance cost and performance.
* Provider Abstraction: You will learn how to write a single function that can talk to different API styles. This is needed so your main application code doesn't become a mess of specific API calls.
* Latency vs. Cost vs. Quality: You will learn how to weigh trade-offs. This is needed because sometimes you need an answer fast (latency), sometimes you need it cheap (cost), and sometimes you need it perfect (quality).
* Fallback Mechanisms: You will learn how to automatically switch to a backup if your main AI crashes. This is needed to keep your application running 24/7.
Let's dive in.
The Problem
Imagine you have built a customer support chatbot for a pizza shop. You are using the most powerful model available, let's call it "Omni-GPT," which costs $0.03 per request and takes about 3 seconds to reply.
Here is a typical conversation log:
Do you see the problem?
You just paid premium prices and forced the user to wait 3 seconds just to say "Hello" and "Yes." You are burning money on simple tasks.
Here is what the code usually looks like when beginners start. It is rigid and wasteful.
import time
# Simulating the expensive, powerful model
def call_expensive_model(prompt):
print(f"Connecting to Expensive Omni-GPT for: '{prompt}'...")
time.sleep(3) # Simulating network latency and processing
return "This is a very smart and expensive answer."
# The application logic
user_queries = [
"Hi there",
"What is 2 + 2?",
"Explain the geopolitical implications of the 19th century industrial revolution."
]
for query in user_queries:
# We treat every query exactly the same
response = call_expensive_model(query)
print(f"Response: {response}\n")
The Pain Points:
There has to be a way to send the easy stuff to a cheap, fast intern, and only send the hard stuff to the expensive expert.
Let's Build It
We are going to build a Smart Router system. To make this runnable without requiring you to have three different credit cards and API keys for OpenAI, Anthropic, and Groq, we will simulate the "Providers."
The logic and architecture we build here are exactly what you will use with real APIs.
Step 1: Creating the Mock Providers
First, we need to simulate the environment. We will create two "Fake" LLMs.
Copy and run this code:
import time
import random
class MockProvider:
def __init__(self, name, cost_per_token, avg_latency):
self.name = name
self.cost_per_token = cost_per_token
self.avg_latency = avg_latency
def generate(self, prompt):
# Simulate the time it takes to get a response
print(f" [{self.name}] Processing request...")
time.sleep(self.avg_latency)
# Return a mock response based on the provider personality
if self.name == "FastBot":
return f"FastBot says: Here is a quick, simple answer to '{prompt}'"
else:
return f"SmartBot says: Here is a deeply reasoned, complex answer to '{prompt}'"
# Initialize our providers
fast_llm = MockProvider(name="FastBot", cost_per_token=0.0001, avg_latency=0.5)
smart_llm = MockProvider(name="SmartBot", cost_per_token=0.03, avg_latency=3.0)
print("Providers initialized.")
Why this matters: In a real app, these classes would hold the specific requests.post code or SDK calls (like openai.chat.completions.create) for each specific vendor.
Step 2: The Router Logic (The Classifier)
Now we need a brain. The router needs to look at a prompt and decide: "Is this hard?"
In production, you might actually use a very tiny, cheap LLM to classify the user's intent. For our purpose today, we will write a logical rule-based classifier. We will assume that if a prompt is long or contains specific "trigger words" (like analyze, explain, code), it is complex.
def classify_complexity(prompt):
# Logic: If the prompt is short and simple, use FastBot.
# If it's long or asks for complex tasks, use SmartBot.
complex_keywords = ['explain', 'analyze', 'code', 'why', 'difference', 'history']
# Check 1: Is the prompt very short? (Likely a greeting or simple fact)
if len(prompt.split()) < 5:
return "SIMPLE"
# Check 2: Does it contain complex keywords?
prompt_lower = prompt.lower()
for word in complex_keywords:
if word in prompt_lower:
return "COMPLEX"
# Default to simple if no flags are raised
return "SIMPLE"
# Test the classifier
test_prompts = [
"Hello there",
"What is 2+2?",
"Explain the difference between Python and Java",
"Code a snake game for me"
]
print("--- Testing Classifier ---")
for p in test_prompts:
category = classify_complexity(p)
print(f"Prompt: '{p}' -> Category: {category}")
Why this matters: This function is the traffic cop. By running this logic before calling an API, we save money. A simple Python if statement costs $0.00. An unnecessary GPT-4 call costs real money.
Step 3: The Smart Router Function
Now we combine Step 1 and Step 2. We will write a function that takes the user's prompt, classifies it, and then dispatches it to the correct provider.
def smart_router(prompt):
print(f"\nIncoming Query: '{prompt}'")
# 1. Classify the intent
complexity = classify_complexity(prompt)
print(f" Analysis: This query is {complexity}")
# 2. Route to the appropriate model
if complexity == "SIMPLE":
print(" Routing to: FastBot (Save money!)")
response = fast_llm.generate(prompt)
else:
print(" Routing to: SmartBot (Need brain power!)")
response = smart_llm.generate(prompt)
return response
# Let's run it!
print("--- Starting Smart Router ---")
response_1 = smart_router("Hi")
print(f"FINAL OUTPUT: {response_1}")
response_2 = smart_router("Analyze the economic impact of AI")
print(f"FINAL OUTPUT: {response_2}")
Output Analysis:
When you run this, notice the speed difference. "Hi" should return almost instantly via FastBot. The "Analyze" query will pause for 3 seconds (simulated) before returning via SmartBot. You have successfully optimized for both latency and quality!
Step 4: Adding Cost Tracking
To prove the value of this system, let's track how much money we "spent."
total_cost = 0.0
def smart_router_with_billing(prompt):
global total_cost
complexity = classify_complexity(prompt)
if complexity == "SIMPLE":
selected_model = fast_llm
else:
selected_model = smart_llm
# Simulate generating response
response = selected_model.generate(prompt)
# Calculate simulated cost (assuming 1 token per word for simplicity)
# In reality, you'd count actual tokens used
tokens_used = len(prompt.split()) + len(response.split())
transaction_cost = tokens_used * selected_model.cost_per_token
total_cost += transaction_cost
print(f" [BILLING] This call cost: ${transaction_cost:.5f}")
return response
# Run a batch
queries = [
"Hi",
"What time is it?",
"Explain quantum physics in detail",
"Bye"
]
print("\n--- Processing Batch ---")
for q in queries:
smart_router_with_billing(q)
print(f"\nTotal Session Cost: ${total_cost:.5f}")
Why this matters: If we had sent all 4 queries to SmartBot, the cost would be roughly 300x higher. In a startup processing millions of requests, this logic is the difference between profitability and bankruptcy.
Now You Try
You have a working router. Now, let's make it robust.
StandardBot (Cost: 0.005, Latency: 1.5s). Update the classify_complexity function to return "MEDIUM" for prompts that are between 10 and 20 words long. Update the router to handle this third case.smart_router to accept a user_tier argument. If user_tier == "premium", always route them to SmartBot, regardless of complexity. Premium users pay more, so they get the best model every time.SmartBot, print a warning message saying "This might take a while..." before calling it.Challenge Project: The Automatic Fallback
In production, APIs fail. OpenAI might have an outage, or your credit card might be declined. A robust system doesn't crash; it falls back to a backup.
The Goal: Build a functionreliable_generation(prompt) that tries the primary model, and if it fails, automatically calls the backup model.
Requirements:
MockProvider to accept a failure_rate argument (e.g., 0.5 means it fails 50% of the time).generate method to randomly raise an Exception("API Error") based on that failure rate.Primary model (SmartBot) that is unstable (50% failure rate) and a Backup model (FastBot) that is stable (0% failure rate).reliable_generation function using try and except blocks.Attempting generation with Primary...
ERROR: Primary API Connection Failed.
Switching to Backup Provider...
Success: Backup Provider returned response.
Hint:
Your Python structure will look like this:
try:
# Try the risky, high-quality thing
return primary_model.generate(prompt)
except Exception as e:
# If it blows up, handle it here
print(f"Error: {e}")
return backup_model.generate(prompt)
What You Learned
Today you moved from being a "Prompt Sender" to a "System Architect." You learned:
* Router Pattern: How to dynamically select tools based on the job at hand.
* Cost/Latency Trade-offs: Recognizing that the "best" model isn't always the right choice for every task.
* Abstraction: Hiding the messy details of specific models behind a clean function.
* Reliability: (If you did the challenge) How to ensure your app survives when an API goes down.
Why This Matters:When you build real applications, you won't just be pasting prompts into ChatGPT. You will be building systems that orchestrate multiple models, databases, and tools. The logic you wrote today—routing traffic based on complexity—is exactly how enterprise AI products like Copilot and ChatGPT (the product) work under the hood.
Tomorrow: We dive into the most magical concept in modern NLP: Embeddings. This is the foundation of Search, Memory, and RAG (Retrieval Augmented Generation). Get ready to turn text into numbers!