Capstone: Building
What You'll Build Today
Welcome to Day 77! This is it. You have spent weeks learning Python syntax, understanding how Large Language Models (LLMs) work, mastering vector databases, and experimenting with prompt engineering. Yesterday, you planned your Capstone. Today, we break ground.
You are going to build the Minimum Viable Product (MVP) of your final application.
It is tempting to want to build the "perfect" version right away—a sleek interface, support for 50 file types, and complex multi-agent reasoning. But if you try to do that all at once, you will get stuck. Today is about establishing the "Walking Skeleton": a thin implementation of your system that connects the front end to the back end and actually runs.
Here is what we are focusing on:
* Modular Architecture: Why you must separate your visual interface (Streamlit) from your logic (RAG pipeline).
* Session State Management: Why your chatbot "forgets" conversation history and how to fix it.
* Iterative Integration: How to build one piece, test it, and then attach the next piece, rather than writing 500 lines of code and hoping it works.
Let's turn that design document into working software.
---
The Problem
Imagine you are excited to build your "Financial Report Analyzer." You sit down, open a blank Python file, and start coding. You want it to do everything: upload a PDF, split the text, embed it, store it, run a chat interface, and handle errors.
You end up writing a single, massive script like this:
# monolithic_nightmare.py
import streamlit as st
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
import os
# Trying to do everything in the global scope
st.title("My App")
uploaded_file = st.file_uploader("Upload PDF")
if uploaded_file:
# PROBLEM 1: This runs every single time you click a button
# It re-processes the PDF constantly, wasting money and time.
with open("temp.pdf", "wb") as f:
f.write(uploaded_file.getbuffer())
loader = PyPDFLoader("temp.pdf")
pages = loader.load_and_split()
# PROBLEM 2: Hard to debug. If embedding fails, the whole app crashes.
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(pages, embeddings)
retriever = vectorstore.as_retriever()
user_input = st.text_input("Ask a question")
if user_input:
# PROBLEM 3: Logic is mixed with UI.
# You can't test the logic without running the web server.
llm = ChatOpenAI()
docs = retriever.invoke(user_input)
context = "\n".join([d.page_content for d in docs])
response = llm.invoke(f"Context: {context} Question: {user_input}")
st.write(response.content)
The Pain:
True.There is a better way. We separate concerns. We build the engine first, then the car body.
---
Let's Build It
We are going to build a clean, modular RAG (Retrieval-Augmented Generation) system. We will split our project into two distinct files:
backend.py: Handles the logic (PDF loading, Vector Store, LLM).app.py: Handles the user interface (Streamlit).Step 1: Build the Backend Class
First, we create a class to manage our data. This allows us to keep the vector store in memory without reloading it constantly.
Create a file named backend.py.
# backend.py
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
class RAGPipeline:
def __init__(self):
# Initialize core components
# We don't load data yet, just set up the tools
self.embeddings = OpenAIEmbeddings()
self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
self.vectorstore = None
self.retriever = None
self.chain = None
def process_pdf(self, file_path):
"""
Loads a PDF, splits it, and creates a vector store.
Returns True if successful.
"""
try:
# 1. Load
loader = PyPDFLoader(file_path)
docs = loader.load()
# 2. Split
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.split_documents(docs)
# 3. Store
self.vectorstore = FAISS.from_documents(splits, self.embeddings)
self.retriever = self.vectorstore.as_retriever()
# 4. Build Chain
self._build_chain()
return True
except Exception as e:
print(f"Error processing PDF: {e}")
return False
def _build_chain(self):
"""Internal method to construct the RAG chain."""
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
self.chain = (
{"context": self.retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| self.llm
| StrOutputParser()
)
def ask(self, query):
"""
Public method to ask a question.
"""
if not self.chain:
return "Please upload a document first."
return self.chain.invoke(query)
# Simple test block to verify backend works WITHOUT the UI
if __name__ == "__main__":
# Create a dummy PDF for testing if needed, or point to a real one
print("Backend test initialized. Point to a real PDF path to test.")
# rag = RAGPipeline()
# rag.process_pdf("sample.pdf")
# print(rag.ask("Summarize this document."))
Why this matters: We have encapsulated all the complex logic. We can test this file independently. If this breaks, we know the issue is in the AI logic, not the web button.
Step 2: Set Up the Streamlit Interface
Now we build the UI. We will use st.session_state to keep our RAGPipeline alive between user interactions so we don't reload the PDF every time.
Create a file named app.py in the same folder.
# app.py
import streamlit as st
import os
import tempfile
from backend import RAGPipeline # Import our class
st.set_page_config(page_title="Capstone MVP", layout="wide")
st.title("📄 Document AI Assistant")
# --- SESSION STATE SETUP ---
# This ensures the backend object persists across reruns
if "rag_pipeline" not in st.session_state:
st.session_state.rag_pipeline = RAGPipeline()
if "messages" not in st.session_state:
st.session_state.messages = []
# --- SIDEBAR: CONFIGURATION ---
with st.sidebar:
st.header("Setup")
uploaded_file = st.file_uploader("Upload a PDF", type="pdf")
if uploaded_file:
# Save uploaded file to a temporary file
# Streamlit uploads are bytes; LangChain needs a file path
with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
tmp_file.write(uploaded_file.getvalue())
tmp_path = tmp_file.name
if st.button("Process Document"):
with st.spinner("Processing..."):
success = st.session_state.rag_pipeline.process_pdf(tmp_path)
if success:
st.success("Document processed! You can now chat.")
# Clean up temp file
os.remove(tmp_path)
else:
st.error("Failed to process document.")
# --- MAIN CHAT INTERFACE ---
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
# Handle user input
if prompt := st.chat_input("Ask a question about your document..."):
# 1. Display user message immediately
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
# 2. Get response from backend
with st.chat_message("assistant"):
message_placeholder = st.empty()
full_response = ""
# Call the backend
with st.spinner("Thinking..."):
response = st.session_state.rag_pipeline.ask(prompt)
message_placeholder.markdown(response)
# 3. Save assistant response to history
st.session_state.messages.append({"role": "assistant", "content": response})
Step 3: Run and Verify
Open your terminal and run:
``bash
streamlit run app.py
`
What you should see:
A clean interface with a sidebar.
When you upload a PDF and click "Process," it takes a moment (loading/embedding), then says "Success."
When you ask a question, the answer appears.
Crucially: If you ask a second question, it answers instantly without re-processing the PDF. That is the power of Session State.
Step 4: Iterative Improvement (Adding Source Citations)
Right now, the bot answers, but we don't know where the answer came from. Let's modify the
backend.py to return sources.
Update the
_build_chain and ask methods in backend.py:
def _build_chain(self):
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
# We modify the chain to return the source documents too
self.chain = (
{"context": self.retriever, "question": RunnablePassthrough()}
| RunnablePassthrough.assign(answer=prompt | self.llm | StrOutputParser())
)
def ask(self, query):
if not self.chain:
return {"answer": "Please upload a document first.", "sources": []}
result = self.chain.invoke(query)
return {
"answer": result["answer"],
"sources": [doc.page_content[:100] + "..." for doc in result["context"]]
}
Now, update
app.py to handle this dictionary response:
# Inside the chat handling block in app.py
with st.spinner("Thinking..."):
result_dict = st.session_state.rag_pipeline.ask(prompt)
response_text = result_dict["answer"]
sources = result_dict["sources"]
# Display answer
message_placeholder.markdown(response_text)
# Display sources in an expander
with st.expander("View Sources"):
for s in sources:
st.info(s)
# Update history with just the text answer
st.session_state.messages.append({"role": "assistant", "content": response_text})
By making small changes to the backend and then the frontend, we keep control of the code.
---
Now You Try
You have a working RAG skeleton. Now apply this to your specific capstone idea.
Customize the System Prompt:
Go into
backend.py. Change the template string. Give your AI a persona. Is it a "Strict Legal Assistant"? A "Friendly Tutor"? Add specific instructions like "If you don't know, say 'I am not sure'."
Add a "Clear Chat" Button:
In the sidebar of
app.py, add a button that clears st.session_state.messages. This helps when you want to start a fresh topic without refreshing the whole page (which would lose the processed PDF).
Handle Different File Types:
If your capstone involves text files or CSVs, modify the
process_pdf method in backend.py (rename it to process_file) to check the file extension and use the appropriate LangChain loader (e.g., TextLoader or CSVLoader).
---
Challenge Project: The End-to-End Capstone Alpha
Your goal today is not to finish your capstone, but to finish the Alpha version.
Requirements:
Data Ingestion: The app must accept the specific data type relevant to your project (PDF, URL, Text, JSON).
Storage: It must successfully create a vector store from that data.
Retrieval: It must retrieve relevant context based on a user query.
Generation: It must produce a coherent answer using an LLM.
Persistence: The chat history must persist as long as the browser tab is open (using Session State).
Safety: It must not crash if the user asks a question before uploading a file.
Example Scenario:
If you are building a "Recipe Helper":
* Input: User uploads a PDF cookbook.
* Action: User asks, "How do I make lasagna?"
Output: The bot replies with the recipe from that specific book*, not general internet knowledge.
Hint: If you get stuck on specific errors (like API connection issues), check your .env file loading. Ensure load_dotenv() is called at the very top of backend.py.
---
What You Learned
Today you moved from planning to execution. You learned:
* Separation of Concerns: Keeping logic (
backend.py) separate from presentation (app.py) makes debugging infinitely easier.
* State Management: Using
st.session_state` to prevent your application from "forgetting" data every time the user interacts with it.
* The MVP Mindset: Building the simplest thing that works end-to-end before trying to make it perfect.
Why This Matters:In the real world, AI applications are rarely single scripts. They are systems composed of APIs, databases, and frontends. The architecture you built today—separating the "Brain" (RAG Pipeline) from the "Face" (Streamlit)—is the exact same architecture used in enterprise applications, just on a smaller scale.
Tomorrow: We focus on Polish and Presentation. You have a working skeleton; tomorrow we add error handling, better formatting, and prepare the narrative for your final demo.