Day 14: Streaming - GenAI Bootcamp

What You'll Build Today

You're building a ChatGPT-style interface where responses appear word-by-word in real-time instead of all at once after waiting. This makes your app feel fast, responsive, and professional.

Today's Project: A streaming chat interface that shows AI responses as they're generated.

The Problem

Without streaming, users stare at a loading spinner for 10-30 seconds while the AI generates a response. This feels slow and broken, even though the AI is working.

The Pain:

// Non-streaming (bad UX)
User: "Explain quantum computing"
[... waits 15 seconds seeing nothing ...]
AI: [entire 500-word response appears at once]

// User thinks: "Is this broken? Should I refresh?"

// Streaming (good UX)
User: "Explain quantum computing"
AI: "Quantum computing is a..." [keeps typing]
    "...revolutionary technology that..." [keeps typing]

// User thinks: "It's working! I can start reading!"

Streaming makes your app feel 10x faster even though the total time is the same.

Let's Build It

Step 1: Basic Streaming with OpenAI

Enable streaming by setting stream: true and handling chunks:

import OpenAI from 'openai';

const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY
});

async function streamResponse(prompt) {
    const stream = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: prompt }],
        stream: true  // Enable streaming!
    });

    // Process chunks as they arrive
    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        process.stdout.write(content);  // Print immediately
    }

    console.log(); // New line at end
}

// Try it
await streamResponse("Write a short story about a robot");
// You'll see: "Once upon a time..." appear word by word!

Step 2: Streaming with Event Handlers

Create a cleaner interface with callbacks:

async function streamWithCallbacks(prompt, onChunk, onComplete) {
    let fullResponse = '';

    const stream = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: prompt }],
        stream: true
    });

    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';

        if (content) {
            fullResponse += content;
            onChunk(content);  // Call this for each chunk
        }
    }

    onComplete(fullResponse);  // Call this when done
}

// Use it
await streamWithCallbacks(
    "Explain JavaScript promises",
    (chunk) => {
        process.stdout.write(chunk);  // Stream to console
    },
    (full) => {
        console.log('\n\nDone! Total length:', full.length);
    }
);

Step 3: Streaming to a Web Interface

Create an Express endpoint that streams to the browser:

import express from 'express';

const app = express();
app.use(express.json());

app.post('/api/chat', async (req, res) => {
    const { message } = req.body;

    // Set headers for Server-Sent Events (SSE)
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');

    const stream = await openai.chat.completions.create({
        model: "gpt-4",
        messages: [{ role: "user", content: message }],
        stream: true
    });

    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content || '';
        if (content) {
            // Send chunk to browser
            res.write(`data: ${JSON.stringify({ content })}\n\n`);
        }
    }

    res.write('data: [DONE]\n\n');
    res.end();
});

app.listen(3000, () => {
    console.log('Server running on http://localhost:3000');
});

Frontend code to receive the stream:

// client.js
async function sendMessage(message) {
    const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message })
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { value, done } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n');

        for (const line of lines) {
            if (line.startsWith('data: ')) {
                const data = line.slice(6);
                if (data === '[DONE]') return;

                const parsed = JSON.parse(data);
                // Append to UI
                document.getElementById('output').textContent += parsed.content;
            }
        }
    }
}

// Use it
sendMessage("Tell me a joke");

Step 4: Building a Complete Chat UI

Create a full streaming chat interface:

// chat.html
<!DOCTYPE html>
<html>
<head>
    <style>
        #messages {
            height: 400px;
            overflow-y: auto;
            border: 1px solid #ccc;
            padding: 10px;
            margin-bottom: 10px;
        }
        .message {
            margin: 10px 0;
            padding: 8px;
            border-radius: 4px;
        }
        .user { background: #e3f2fd; }
        .assistant { background: #f5f5f5; }
        .streaming { opacity: 0.8; }
    </style>
</head>
<body>
    <div id="messages"></div>
    <input id="input" type="text" placeholder="Type a message...">
    <button onclick="sendMessage()">Send</button>

    <script>
        async function sendMessage() {
            const input = document.getElementById('input');
            const message = input.value;
            if (!message) return;

            // Show user message
            addMessage('user', message);
            input.value = '';

            // Create assistant message for streaming
            const assistantMsg = addMessage('assistant', '', true);

            // Stream response
            const response = await fetch('/api/chat', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ message })
            });

            const reader = response.body.getReader();
            const decoder = new TextDecoder();
            let fullText = '';

            while (true) {
                const { value, done } = await reader.read();
                if (done) break;

                const chunk = decoder.decode(value);
                const lines = chunk.split('\n');

                for (const line of lines) {
                    if (line.startsWith('data: ')) {
                        const data = line.slice(6);
                        if (data === '[DONE]') {
                            assistantMsg.classList.remove('streaming');
                            return;
                        }

                        const parsed = JSON.parse(data);
                        fullText += parsed.content;
                        assistantMsg.textContent = fullText;

                        // Auto-scroll
                        assistantMsg.scrollIntoView({ behavior: 'smooth' });
                    }
                }
            }
        }

        function addMessage(role, content, streaming = false) {
            const messagesDiv = document.getElementById('messages');
            const msgDiv = document.createElement('div');
            msgDiv.className = `message ${role} ${streaming ? 'streaming' : ''}`;
            msgDiv.textContent = content;
            messagesDiv.appendChild(msgDiv);
            return msgDiv;
        }

        // Send on Enter key
        document.getElementById('input').addEventListener('keypress', (e) => {
            if (e.key === 'Enter') sendMessage();
        });
    </script>
</body>
</html>

Step 5: Error Handling in Streams

Handle errors gracefully during streaming:

async function streamWithErrorHandling(prompt, onChunk, onComplete, onError) {
    try {
        const stream = await openai.chat.completions.create({
            model: "gpt-4",
            messages: [{ role: "user", content: prompt }],
            stream: true
        });

        let fullResponse = '';

        for await (const chunk of stream) {
            const content = chunk.choices[0]?.delta?.content || '';

            if (content) {
                fullResponse += content;
                onChunk(content);
            }

            // Check for finish reason
            if (chunk.choices[0]?.finish_reason === 'length') {
                onError(new Error('Response truncated - max tokens reached'));
                return;
            }
        }

        onComplete(fullResponse);

    } catch (error) {
        onError(error);
    }
}

// Use it
await streamWithErrorHandling(
    "Tell me about AI",
    (chunk) => process.stdout.write(chunk),
    (full) => console.log('\n✓ Complete'),
    (error) => console.error('\n✗ Error:', error.message)
);

Now You Try

Exercise 1: Typing Speed Control

Add artificial delay between chunks to simulate human typing speed. Make it configurable (slow, medium, fast).

Exercise 2: Streaming Stats

Display real-time statistics while streaming: tokens per second, total tokens, estimated time remaining.

Exercise 3: Cancel Streaming

Add a "Stop generating" button that cancels the stream mid-response. Clean up properly.

Challenge Project

Build: Multi-Chat Streaming Dashboard

Create an advanced chat interface with these features:

Multiple chat threads (like ChatGPT sidebar)
Streaming responses with typing indicators
Save/load conversation history
Copy code blocks with syntax highlighting
Regenerate responses
Export chat as markdown or PDF
Show token usage per message
Bonus: Stream to multiple models simultaneously and compare

What You Learned

Streaming provides real-time feedback as responses generate
Server-Sent Events (SSE) send data from server to browser continuously
Delta content is the chunk of new text in each stream update
UX improvement - streaming makes apps feel faster even at same speed
Error handling is critical for graceful failures mid-stream
ReadableStream API handles streaming data in browsers

Key Insight: Streaming isn't just a technical feature - it's a UX improvement that makes your AI app feel professional and responsive. Always use it for user-facing applications!