FastAPI Production Backend
What You'll Build Today
Welcome to Day 72. Up until now, we have mostly treated our Python scripts as standalone islands. You run a script, it does some AI magic, and it prints the result or shows it in a local Streamlit interface.
But imagine you want to share your AI logic with a mobile app team, a website team, and an internal Slack bot. You cannot copy-paste your Python logic into all those places. You need a central "brain" that lives on a server, accepts requests from anywhere, and sends back answers.
Today, you will build a production-grade FastAPI Backend. This is the standard in modern AI engineering.
Here is what you will learn and why:
* FastAPI Framework: You will move beyond simple scripts to build a robust web server that acts as the engine for your AI.
* Asynchronous Endpoints (async/await): You will learn how to handle multiple users at once. If User A sends a heavy prompt, User B shouldn't have to wait for it to finish before they can say "Hello."
* Pydantic Models: You will learn to enforce strict rules on data entering your system. No more crashing because someone sent a number instead of text.
* CORS & Middleware: You will learn the security settings required to let your frontend talk to your backend without the browser blocking the connection.
* Background Tasks: You will learn how to trigger "fire and forget" actions (like saving to a database) that don't slow down the user's response time.
The Problem
Let's look at why we can't just stick with what we have been doing.
Imagine you have written a Python script using a standard web framework (like an older version of Flask) or just a basic script to handle LLM requests. LLMs are slow. Generating a paragraph might take 5 to 10 seconds.
If you write standard, synchronous Python code, the computer processes one line at a time. If a user makes a request that takes 10 seconds, the computer stops everything else to wait for that 10 seconds to pass.
Here is a simulation of a "Blocking" server. Read this code, but you don't need to run it—just understand the frustration it represents.
# THE PROBLEM: A blocking server simulation
import time
def handle_user_request(user_id, prompt):
print(f"Received request from User {user_id}...")
# Simulate a slow LLM call (blocking operation)
# The entire program freezes here for 5 seconds
time.sleep(5)
return f"Response for {user_id}"
# Scenario:
# User 1 sends a request.
# User 2 sends a request 0.1 seconds later.
print("--- SERVER START ---")
# User 1 arrives
start = time.time()
print(handle_user_request(1, "Write a poem"))
# User 2 has to wait for User 1 to finish completely!
print(handle_user_request(2, "Hello"))
end = time.time()
print(f"Total time taken: {end - start:.2f} seconds")
The Pain:
In the code above, the total time will be roughly 10 seconds. User 2, who just wanted to say "Hello" (a fast operation), had to wait 5 seconds for User 1's poem to finish.
If you deploy this to production with 100 users, User 100 might wait several minutes just to get a simple response. This is unacceptable for modern applications.
There has to be a way to tell the server: "Hey, while you are waiting for the LLM to think about User 1's poem, go ahead and process User 2's request."
That way is Asynchronous Programming, and FastAPI makes it incredibly easy.
Let's Build It
We are going to build a backend API that accepts text prompts, processes them (simulated), and returns responses. We will do this properly, with validation and documentation.
Step 1: Installation and "Hello World"
First, we need to install the framework and the server runner.
``bash
pip install fastapi uvicorn
`
Now, let's create our first basic API. We will use
uvicorn to run it. Uvicorn is a lightning-fast server implementation.
Create a file named
backend_v1.py:
from fastapi import FastAPI
import uvicorn
# Initialize the app
app = FastAPI(
title="GenAI Bootcamp API",
description="A production-ready backend for our AI applications",
version="1.0.0"
)
# Define a "route" or "endpoint"
# This is accessible via HTTP GET at the root "/"
@app.get("/")
async def root():
return {"message": "System Online", "status": "OK"}
# This block allows us to run the script directly with python backend_v1.py
if __name__ == "__main__":
# host="0.0.0.0" makes it accessible on the local network
# port=8000 is standard for API development
uvicorn.run(app, host="0.0.0.0", port=8000)
Run this script. You will see logs indicating the server started.
Open your web browser and go to
http://localhost:8000. You should see the JSON response: {"message": "System Online", "status": "OK"}.
The Magic Feature:
FastAPI automatically generates documentation for you.
Go to
http://localhost:8000/docs. You will see a professional UI (Swagger UI) where you can test your endpoints. This saves hours of writing documentation.
Step 2: Structured Data with Pydantic
Passing data around as loose dictionaries or raw strings is dangerous. What if the user forgets to send the prompt? What if they send a list instead of a string?
We use Pydantic to define the "shape" of our data. If the incoming data doesn't match the shape, FastAPI rejects it automatically with a clear error message.
Create
backend_v2.py:
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
from datetime import datetime
app = FastAPI()
# 1. Define the Input Schema
# The user MUST send a JSON object with a "prompt" (string)
# "model_name" is optional, defaults to "gpt-3.5"
class ChatRequest(BaseModel):
prompt: str
model_name: str = "gpt-3.5"
# 2. Define the Output Schema
# We promise to return this structure
class ChatResponse(BaseModel):
response: str
timestamp: str
tokens_used: int
@app.post("/chat", response_model=ChatResponse)
async def chat_endpoint(request: ChatRequest):
# We can access data using dot notation because of Pydantic
user_prompt = request.prompt
# Simulate AI logic
fake_ai_response = f"I received your prompt: '{user_prompt}' using model {request.model_name}"
# Return data matching the ChatResponse schema
return ChatResponse(
response=fake_ai_response,
timestamp=datetime.now().isoformat(),
tokens_used=len(user_prompt.split())
)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Run this. Go to
http://localhost:8000/docs.
Click on the POST
/chat endpoint, click "Try it out," and execute it. You'll see the structured response.
Try deleting the
prompt from the JSON in the docs and executing. You will get a 422 Validation Error. This protects your code from bad data.
Step 3: Async and Concurrency
Now let's solve the blocking problem. We will simulate a slow LLM using
asyncio.sleep. This tells Python: "Pause this specific task, but feel free to do other work while waiting."
Create
backend_v3.py:
import asyncio
from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
import time
app = FastAPI()
class ChatRequest(BaseModel):
prompt: str
@app.post("/generate")
async def generate_text(request: ChatRequest):
print(f"Start processing: {request.prompt}")
# await asyncio.sleep(5) is non-blocking.
# It releases the server to handle other requests while waiting.
# Unlike time.sleep(5) which freezes the server.
await asyncio.sleep(5)
print(f"Finished processing: {request.prompt}")
return {"result": f"Generated content for: {request.prompt}"}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Why this matters:
If you run this server, you could have 100 users hit the
/generate endpoint at the exact same second. The server would accept all of them immediately, wait 5 seconds, and then reply to all of them roughly at the same time.
If we used
time.sleep(5) (synchronous), the 100th user would wait 500 seconds.
Step 4: Background Tasks
Sometimes you want to do something after you send the response. For example, logging the conversation to a database or sending an email alert. You don't want the user to wait for these administrative tasks.
FastAPI has
BackgroundTasks for this.
Create
backend_v4.py:
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import uvicorn
import time
app = FastAPI()
class ChatRequest(BaseModel):
prompt: str
# A function to simulate a slow database write
def write_to_log_file(prompt: str):
time.sleep(2) # Simulating a 2-second database save
with open("chat_logs.txt", "a") as f:
f.write(f"Logged: {prompt}\n")
print(f"Background task complete: Logged '{prompt}'")
@app.post("/chat_quick")
async def chat_quick(request: ChatRequest, background_tasks: BackgroundTasks):
# 1. Prepare the response immediately
response_text = f"AI response to: {request.prompt}"
# 2. Schedule the slow task to run AFTER the response is sent
background_tasks.add_task(write_to_log_file, request.prompt)
# 3. Return immediately. The user doesn't wait for the file write.
return {"response": response_text}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Run this. When you hit the endpoint, you get the response instantly. Two seconds later, you will see the print message in your terminal. This makes your API feel incredibly snappy.
Step 5: Middleware (CORS) & Structured Logging
Finally, we assemble the production-ready application.
CORS (Cross-Origin Resource Sharing): By default, browsers block a frontend running on port 8501 (Streamlit) from talking to a backend on port 8000 (FastAPI) for security. We must explicitly allow it.
Structured Logging: In production, we don't just print(). We need logs that include timestamps and request details.
Here is the final, complete code for
main.py:
import logging
import asyncio
from fastapi import FastAPI, Request, BackgroundTasks, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import uvicorn
from datetime import datetime
# 1. Configure Logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)
app = FastAPI(title="Production AI Backend")
# 2. CORS Configuration
# Allow Streamlit (usually port 8501) to talk to this backend
origins = [
"http://localhost:8501",
"http://127.0.0.1:8501",
]
app.add_middleware(
CORSMiddleware,
allow_origins=origins,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 3. Data Models
class ChatInput(BaseModel):
user_id: str
prompt: str
class ChatOutput(BaseModel):
response: str
processing_time: float
# 4. Middleware for Timing
# This runs for EVERY request to measure performance
@app.middleware("http")
async def add_process_time_header(request: Request, call_next):
start_time = time.time()
response = await call_next(request)
process_time = time.time() - start_time
response.headers["X-Process-Time"] = str(process_time)
logger.info(f"Path: {request.url.path} - Time: {process_time:.4f}s")
return response
# 5. Background Task Function
def log_interaction(user_id: str, prompt: str):
# Simulate DB write
logger.info(f"DATABASE: Saving interaction for user {user_id}")
# 6. Endpoints
@app.get("/health")
async def health_check():
"""Used by monitoring tools to check if server is alive"""
return {"status": "healthy", "timestamp": datetime.now()}
@app.post("/chat", response_model=ChatOutput)
async def chat_endpoint(input_data: ChatInput, background_tasks: BackgroundTasks):
import time # importing locally to avoid conflict with middleware timing
logger.info(f"Received prompt from {input_data.user_id}")
# Simulate AI processing (Async)
start_proc = time.time()
await asyncio.sleep(2)
ai_response = f"Processed: {input_data.prompt}"
duration = time.time() - start_proc
# Schedule logging
background_tasks.add_task(log_interaction, input_data.user_id, input_data.prompt)
return ChatOutput(
response=ai_response,
processing_time=duration
)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
Now You Try
You have a running backend. Now, extend it.
Add a Parameter: Modify ChatInput to accept a temperature (float). If the user provides a temperature higher than 1.0, raise an HTTPException (import it from fastapi) saying "Temperature too high."
New Endpoint: Create a POST endpoint /summarize. It should accept a long string and return a shorter version (you can just slice the string text[:50] + "..." for now).
Authentication Simulation: Add a check in the /chat endpoint. If user_id is "admin", return a special greeting. If it is anything else, treat it normally.
Challenge Project: The Decoupled Architecture
This is the most important architectural pattern you will learn. You are going to separate the Frontend from the Backend completely.
The Goal: Run two separate terminal windows. One runs FastAPI (the brain). The other runs Streamlit (the face). They communicate via HTTP requests.
Requirements:
Backend (Terminal 1): Run the main.py we built in Step 5.
Frontend (Terminal 2): Create a new file frontend_app.py.
* It uses
st.text_input to get a prompt.
* It uses the Python
requests library to send a POST request to http://localhost:8000/chat.
* It displays the
response and the processing_time returned by the API.
* It should handle errors (e.g., if the backend is offline, show a nice error message, not a crash trace).
Example Output (Streamlit):
User: Hello AI
[Submit Button]
... thinking ...
AI: Processed: Hello AI
(Server took 2.001s)
Hints:
* You will need
import requests in your Streamlit app.
* The URL for requests will be
http://127.0.0.1:8000/chat.
* Remember to send the data as JSON:
requests.post(url, json={"user_id": "me", "prompt": "hello"}).
* Check
response.status_code to ensure the request succeeded (200 OK).
What You Learned
Today you graduated from writing scripts to building Distributed Systems.
* FastAPI is the industry standard for Python AI backends.
* Pydantic ensures your data is clean before it ever hits your logic.
* Async/Await allows your server to handle many users simultaneously without blocking.
* Decoupling (separating frontend and backend) allows you to scale, secure, and manage your AI application properly.
Why This Matters:
When you build a real product, you might swap Streamlit for a React website or an iOS app later. Because you built a proper API today, you won't have to rewrite a single line of your AI logic. The new frontend will just connect to your existing
/chat` endpoint.
Tomorrow: Now that we have a working API, how do we ship it to a cloud server? We can't just copy files manually. Tomorrow, we learn Docker—the standard for packaging applications.