Phase 5 of 9

RAG Systems

Days 36-50 (15 days)

The Most Important Pattern in Production AI

RAG (Retrieval Augmented Generation) is how you make AI know YOUR data. It's the #1 requested feature: "Can I chat with my documents?"

This is the longest phase because RAG is the most employable skill. By the end, you'll build a complete customer support bot that answers questions from your knowledge base.

What Problems Will You Solve?

Hallucination

AI confidently invents product features that don't exist

Chunking

100-page PDF is too big - token limit exceeded

Vector Search

Searching 10,000 documents with Python loops takes 30 seconds

Hybrid Search

Semantic search misses exact matches like product IDs

Re-Ranking

7 of 10 "relevant" chunks are actually irrelevant

Evaluation

"I think it works" isn't good enough - need metrics

Daily Schedule

Day 36

RAG Architecture Overview

What RAG is, hallucination problem, pipeline design

Day 37

Document Loading & Parsing

PDFs, HTML, text extraction, metadata

Day 38

Chunking Strategies

Fixed-size, semantic, overlap, tradeoffs

Day 39

Vector Databases: ChromaDB

Local vector storage, collections, queries

Day 40

Vector Databases: Pinecone

Cloud vector DB, scaling, multi-tenancy

Day 41

Building Chat with PDF

Complete RAG pipeline, source citations

Day 42

Retrieval Strategies

Similarity search, MMR, diversity

Day 43

Hybrid Search

BM25 + semantic, reciprocal rank fusion

Day 44

Re-Ranking for Quality

Cross-encoders, Cohere Rerank, two-stage retrieval

Day 45

Query Enhancement & HyDE

Query rewriting, typo handling, multi-query

Day 46

RAG Evaluation

Ragas, precision, recall, faithfulness

Day 47

Conversational RAG

History-aware retrieval, question condensation

Day 48

Advanced RAG Patterns

Compression, multi-modal, GraphRAG concepts

Days 49-50

Project: Customer Support Bot

Full production RAG with knowledge base

← Phase 4 Start Day 36 →