Day 72 of 80

FastAPI Production Backend

Phase 8: Deployment & UI

What You'll Build Today

Welcome to Day 72. Up until now, we have mostly treated our Python scripts as standalone islands. You run a script, it does some AI magic, and it prints the result or shows it in a local Streamlit interface.

But imagine you want to share your AI logic with a mobile app team, a website team, and an internal Slack bot. You cannot copy-paste your Python logic into all those places. You need a central "brain" that lives on a server, accepts requests from anywhere, and sends back answers.

Today, you will build a production-grade FastAPI Backend. This is the standard in modern AI engineering.

Here is what you will learn and why:

* FastAPI Framework: You will move beyond simple scripts to build a robust web server that acts as the engine for your AI.

* Asynchronous Endpoints (async/await): You will learn how to handle multiple users at once. If User A sends a heavy prompt, User B shouldn't have to wait for it to finish before they can say "Hello."

* Pydantic Models: You will learn to enforce strict rules on data entering your system. No more crashing because someone sent a number instead of text.

* CORS & Middleware: You will learn the security settings required to let your frontend talk to your backend without the browser blocking the connection.

* Background Tasks: You will learn how to trigger "fire and forget" actions (like saving to a database) that don't slow down the user's response time.

The Problem

Let's look at why we can't just stick with what we have been doing.

Imagine you have written a Python script using a standard web framework (like an older version of Flask) or just a basic script to handle LLM requests. LLMs are slow. Generating a paragraph might take 5 to 10 seconds.

If you write standard, synchronous Python code, the computer processes one line at a time. If a user makes a request that takes 10 seconds, the computer stops everything else to wait for that 10 seconds to pass.

Here is a simulation of a "Blocking" server. Read this code, but you don't need to run it—just understand the frustration it represents.

# THE PROBLEM: A blocking server simulation

import time

def handle_user_request(user_id, prompt):

print(f"Received request from User {user_id}...")

# Simulate a slow LLM call (blocking operation) # The entire program freezes here for 5 seconds

time.sleep(5)

return f"Response for {user_id}"

# Scenario: # User 1 sends a request. # User 2 sends a request 0.1 seconds later.

print("--- SERVER START ---")

# User 1 arrives

start = time.time()

print(handle_user_request(1, "Write a poem"))

# User 2 has to wait for User 1 to finish completely!

print(handle_user_request(2, "Hello"))

end = time.time()

print(f"Total time taken: {end - start:.2f} seconds")

The Pain:

In the code above, the total time will be roughly 10 seconds. User 2, who just wanted to say "Hello" (a fast operation), had to wait 5 seconds for User 1's poem to finish.

If you deploy this to production with 100 users, User 100 might wait several minutes just to get a simple response. This is unacceptable for modern applications.

There has to be a way to tell the server: "Hey, while you are waiting for the LLM to think about User 1's poem, go ahead and process User 2's request."

That way is Asynchronous Programming, and FastAPI makes it incredibly easy.

Let's Build It

We are going to build a backend API that accepts text prompts, processes them (simulated), and returns responses. We will do this properly, with validation and documentation.

Step 1: Installation and "Hello World"

First, we need to install the framework and the server runner.

``bash

pip install fastapi uvicorn

`

Now, let's create our first basic API. We will use uvicorn to run it. Uvicorn is a lightning-fast server implementation.

Create a file named backend_v1.py:

from fastapi import FastAPI

import uvicorn

# Initialize the app

app = FastAPI(

title="GenAI Bootcamp API",

description="A production-ready backend for our AI applications",

version="1.0.0"

)

# Define a "route" or "endpoint" # This is accessible via HTTP GET at the root "/"

@app.get("/")

async def root():

return {"message": "System Online", "status": "OK"}

# This block allows us to run the script directly with python backend_v1.py

if __name__ == "__main__":

# host="0.0.0.0" makes it accessible on the local network # port=8000 is standard for API development

uvicorn.run(app, host="0.0.0.0", port=8000)

Run this script. You will see logs indicating the server started.

Open your web browser and go to http://localhost:8000. You should see the JSON response: {"message": "System Online", "status": "OK"}.

The Magic Feature:

FastAPI automatically generates documentation for you.

Go to http://localhost:8000/docs. You will see a professional UI (Swagger UI) where you can test your endpoints. This saves hours of writing documentation.

Step 2: Structured Data with Pydantic

Passing data around as loose dictionaries or raw strings is dangerous. What if the user forgets to send the prompt? What if they send a list instead of a string?

We use Pydantic to define the "shape" of our data. If the incoming data doesn't match the shape, FastAPI rejects it automatically with a clear error message.

Create backend_v2.py:

from fastapi import FastAPI

from pydantic import BaseModel

import uvicorn

from datetime import datetime

app = FastAPI()

# 1. Define the Input Schema # The user MUST send a JSON object with a "prompt" (string) # "model_name" is optional, defaults to "gpt-3.5"

class ChatRequest(BaseModel):

prompt: str

model_name: str = "gpt-3.5"

# 2. Define the Output Schema # We promise to return this structure

class ChatResponse(BaseModel):

response: str

timestamp: str

tokens_used: int

@app.post("/chat", response_model=ChatResponse)

async def chat_endpoint(request: ChatRequest):

# We can access data using dot notation because of Pydantic

user_prompt = request.prompt

# Simulate AI logic

fake_ai_response = f"I received your prompt: '{user_prompt}' using model {request.model_name}"

# Return data matching the ChatResponse schema

return ChatResponse(

response=fake_ai_response,

timestamp=datetime.now().isoformat(),

tokens_used=len(user_prompt.split())

)

if __name__ == "__main__":

uvicorn.run(app, host="0.0.0.0", port=8000)

Run this. Go to http://localhost:8000/docs.

Click on the POST /chat endpoint, click "Try it out," and execute it. You'll see the structured response.

Try deleting the prompt from the JSON in the docs and executing. You will get a 422 Validation Error. This protects your code from bad data.

Step 3: Async and Concurrency

Now let's solve the blocking problem. We will simulate a slow LLM using asyncio.sleep. This tells Python: "Pause this specific task, but feel free to do other work while waiting."

Create backend_v3.py:

import asyncio

from fastapi import FastAPI

from pydantic import BaseModel

import uvicorn

import time

app = FastAPI()

class ChatRequest(BaseModel):

prompt: str

@app.post("/generate")

async def generate_text(request: ChatRequest):

print(f"Start processing: {request.prompt}")

# await asyncio.sleep(5) is non-blocking. # It releases the server to handle other requests while waiting. # Unlike time.sleep(5) which freezes the server.

await asyncio.sleep(5)

print(f"Finished processing: {request.prompt}")

return {"result": f"Generated content for: {request.prompt}"}

if __name__ == "__main__":

uvicorn.run(app, host="0.0.0.0", port=8000)

Why this matters:

If you run this server, you could have 100 users hit the /generate endpoint at the exact same second. The server would accept all of them immediately, wait 5 seconds, and then reply to all of them roughly at the same time.

If we used time.sleep(5) (synchronous), the 100th user would wait 500 seconds.

Step 4: Background Tasks

Sometimes you want to do something after you send the response. For example, logging the conversation to a database or sending an email alert. You don't want the user to wait for these administrative tasks.

FastAPI has BackgroundTasks for this.

Create backend_v4.py:

from fastapi import FastAPI, BackgroundTasks

from pydantic import BaseModel

import uvicorn

import time

app = FastAPI()

class ChatRequest(BaseModel):

prompt: str

# A function to simulate a slow database write

def write_to_log_file(prompt: str):

time.sleep(2) # Simulating a 2-second database save

with open("chat_logs.txt", "a") as f:

f.write(f"Logged: {prompt}\n")

print(f"Background task complete: Logged '{prompt}'")

@app.post("/chat_quick")

async def chat_quick(request: ChatRequest, background_tasks: BackgroundTasks):

# 1. Prepare the response immediately

response_text = f"AI response to: {request.prompt}"

# 2. Schedule the slow task to run AFTER the response is sent

background_tasks.add_task(write_to_log_file, request.prompt)

# 3. Return immediately. The user doesn't wait for the file write.

return {"response": response_text}

if __name__ == "__main__":

uvicorn.run(app, host="0.0.0.0", port=8000)

Run this. When you hit the endpoint, you get the response instantly. Two seconds later, you will see the print message in your terminal. This makes your API feel incredibly snappy.

Step 5: Middleware (CORS) & Structured Logging

Finally, we assemble the production-ready application.

CORS (Cross-Origin Resource Sharing): By default, browsers block a frontend running on port 8501 (Streamlit) from talking to a backend on port 8000 (FastAPI) for security. We must explicitly allow it. Structured Logging: In production, we don't just
print(). We need logs that include timestamps and request details.

Here is the final, complete code for main.py:

import logging

import asyncio

from fastapi import FastAPI, Request, BackgroundTasks, HTTPException

from fastapi.middleware.cors import CORSMiddleware

from pydantic import BaseModel

import uvicorn

from datetime import datetime

# 1. Configure Logging

logging.basicConfig(

level=logging.INFO,

format="%(asctime)s - %(levelname)s - %(message)s"

)

logger = logging.getLogger(__name__)

app = FastAPI(title="Production AI Backend")

# 2. CORS Configuration # Allow Streamlit (usually port 8501) to talk to this backend

origins = [

"http://localhost:8501",

"http://127.0.0.1:8501",

]

app.add_middleware(

CORSMiddleware,

allow_origins=origins,

allow_credentials=True,

allow_methods=["*"],

allow_headers=["*"],

)

# 3. Data Models

class ChatInput(BaseModel):

user_id: str

prompt: str

class ChatOutput(BaseModel):

response: str

processing_time: float

# 4. Middleware for Timing # This runs for EVERY request to measure performance

@app.middleware("http")

async def add_process_time_header(request: Request, call_next):

start_time = time.time()

response = await call_next(request)

process_time = time.time() - start_time

response.headers["X-Process-Time"] = str(process_time)

logger.info(f"Path: {request.url.path} - Time: {process_time:.4f}s")

return response

# 5. Background Task Function

def log_interaction(user_id: str, prompt: str):

# Simulate DB write

logger.info(f"DATABASE: Saving interaction for user {user_id}")

# 6. Endpoints

@app.get("/health")

async def health_check():

"""Used by monitoring tools to check if server is alive"""

return {"status": "healthy", "timestamp": datetime.now()}

@app.post("/chat", response_model=ChatOutput)

async def chat_endpoint(input_data: ChatInput, background_tasks: BackgroundTasks):

import time # importing locally to avoid conflict with middleware timing

logger.info(f"Received prompt from {input_data.user_id}")

# Simulate AI processing (Async)

start_proc = time.time()

await asyncio.sleep(2)

ai_response = f"Processed: {input_data.prompt}"

duration = time.time() - start_proc

# Schedule logging

background_tasks.add_task(log_interaction, input_data.user_id, input_data.prompt)

return ChatOutput(

response=ai_response,

processing_time=duration

)

if __name__ == "__main__":

uvicorn.run(app, host="0.0.0.0", port=8000)

Now You Try

You have a running backend. Now, extend it.

  • Add a Parameter: Modify ChatInput to accept a temperature (float). If the user provides a temperature higher than 1.0, raise an HTTPException (import it from fastapi) saying "Temperature too high."
  • New Endpoint: Create a POST endpoint /summarize. It should accept a long string and return a shorter version (you can just slice the string text[:50] + "..." for now).
  • Authentication Simulation: Add a check in the /chat endpoint. If user_id is "admin", return a special greeting. If it is anything else, treat it normally.
  • Challenge Project: The Decoupled Architecture

    This is the most important architectural pattern you will learn. You are going to separate the Frontend from the Backend completely.

    The Goal: Run two separate terminal windows. One runs FastAPI (the brain). The other runs Streamlit (the face). They communicate via HTTP requests. Requirements:
  • Backend (Terminal 1): Run the main.py we built in Step 5.
  • Frontend (Terminal 2): Create a new file frontend_app.py.
  • * It uses st.text_input to get a prompt.

    * It uses the Python requests library to send a POST request to http://localhost:8000/chat.

    * It displays the response and the processing_time returned by the API.

    * It should handle errors (e.g., if the backend is offline, show a nice error message, not a crash trace).

    Example Output (Streamlit):
    User: Hello AI
    

    [Submit Button]

    ... thinking ...

    AI: Processed: Hello AI

    (Server took 2.001s)

    Hints:

    * You will need import requests in your Streamlit app.

    * The URL for requests will be http://127.0.0.1:8000/chat.

    * Remember to send the data as JSON: requests.post(url, json={"user_id": "me", "prompt": "hello"}).

    * Check response.status_code to ensure the request succeeded (200 OK).

    What You Learned

    Today you graduated from writing scripts to building Distributed Systems.

    * FastAPI is the industry standard for Python AI backends.

    * Pydantic ensures your data is clean before it ever hits your logic.

    * Async/Await allows your server to handle many users simultaneously without blocking.

    * Decoupling (separating frontend and backend) allows you to scale, secure, and manage your AI application properly.

    Why This Matters:

    When you build a real product, you might swap Streamlit for a React website or an iOS app later. Because you built a proper API today, you won't have to rewrite a single line of your AI logic. The new frontend will just connect to your existing /chat` endpoint.

    Tomorrow: Now that we have a working API, how do we ship it to a cloud server? We can't just copy files manually. Tomorrow, we learn Docker—the standard for packaging applications.