Web Search & Live Data Tools
What You'll Build Today
Up until now, the AI models we have built are like brilliant encyclopedias locked in a basement. They know everything about the world up to their "training cutoff" date, but they have absolutely no idea what happened five minutes ago. They cannot tell you the current price of Bitcoin, the score of last night's game, or the latest feature released in Python.
Today, we are going to break them out of the basement. We are giving your AI access to the internet.
We will build a Live News Summarizer Agent. You will give it a topic (like "Artificial Intelligence regulation" or "SpaceX latest launch"), and it will:
Here are the concepts you will master:
* Search APIs (Tavily): Why scraping Google manually is hard and how specialized APIs make it easy for AI.
* Tool Binding: How to teach an LLM that it has a "tool" (like a search engine) and when to use it.
* Real-time Context: Injecting live data into the prompt context window dynamically.
* Sourcing & Citations: Techniques to force the AI to prove its work by linking to where it found the information.
Let's turn the lights on.
The Problem
To understand why we need tools, let's look at what happens when we ask a standard Large Language Model (LLM) about a recent event.
Imagine you want to know the current weather or a stock price. These change every second. An LLM is a static file trained months or years ago.
Here is code that represents the "pain point." If you run this, the result will be disappointing.
import os
from langchain_openai import ChatOpenAI
# Setup your OpenAI Key
os.environ["OPENAI_API_KEY"] = "sk-..." # Your key here
# Initialize the model
llm = ChatOpenAI(model="gpt-4o")
# Ask a question about something happening RIGHT NOW or very recently
query = "What is the current stock price of Apple (AAPL) right this second?"
response = llm.invoke(query)
print("--- The AI's Attempt ---")
print(response.content)
The Frustrating Output:
Depending on the model, you will get one of two bad responses:
This is the fundamental limitation of "frozen" models. They are reasoning engines, not search engines. To fix this, we don't need a smarter model; we need to give the model a tool.
Let's Build It
We will use a tool called Tavily. While you could try to scrape Google Search results, Google fights bots aggressively. Tavily is a search engine built specifically for AI agents. It returns clean text, not messy HTML code, making it perfect for our needs.
Step 1: Setup and Basic Search
First, you need to get a free API key from Tavily (tavily.com) and install the libraries.
Terminal:``bash
pip install langchain langchain-openai langchain-community tavily-python
`
Now, let's just see what Tavily does. We won't even use the LLM yet. We will just ask Python to search the web.
import os
from langchain_community.tools.tavily_search import TavilySearchResults
# 1. Set up keys
os.environ["OPENAI_API_KEY"] = "your_openai_key_here"
os.environ["TAVILY_API_KEY"] = "your_tavily_key_here"
# 2. Initialize the search tool
# k=3 means we want the top 3 results
search_tool = TavilySearchResults(k=3)
# 3. Run a search manually to see what happens
print("Searching for recent news...")
results = search_tool.invoke("Winner of the 2024 Super Bowl")
# 4. Inspect the raw data
for result in results:
print(f"\nSource: {result['url']}")
print(f"Content snippet: {result['content'][:200]}...") # Printing first 200 chars
Why this matters:
Run this code. You will see a list of dictionaries. Each dictionary contains a
url and content. This is the raw material. We have successfully fetched live data. Now we need an AI to read it for us.
Step 2: Binding the Tool to the LLM
We cannot just paste the search results into the prompt manually every time. We want the LLM to decide when to search.
We use a concept called binding. We tell the LLM: "Here is a list of functions you can call if you get stuck."
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
# Initialize Tool
search_tool = TavilySearchResults(k=3)
tools = [search_tool]
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o")
# Bind the tool to the LLM
llm_with_tools = llm.bind_tools(tools)
# Ask a question that REQUIRES the tool
query = "What is the current weather in Tokyo?"
response = llm_with_tools.invoke(query)
print(f"Content: {response.content}")
print(f"Tool Calls: {response.tool_calls}")
Analyze the Output:
The
response.content might be empty. Why? Because the LLM didn't answer the question yet. Instead, look at response.tool_calls.
It likely says something like:
name='tavily_search_results_json', args={'query': 'current weather in Tokyo'}.
The LLM is saying: "I don't know the answer, but I see you gave me a search tool. Please run this search for me."
Step 3: Creating the Agent Executor
Handling that back-and-forth (LLM asks for search -> We run search -> We give results back -> LLM answers) is tedious to code manually.
LangChain provides an Agent Executor to handle this loop automatically. It acts as the manager that listens to the LLM's requests and executes the tools.
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
# 1. Setup Tools
search_tool = TavilySearchResults(k=5) # Get 5 results for better summary
tools = [search_tool]
# 2. Setup LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0) # Temp 0 for factual accuracy
# 3. Create the Prompt
# We instruct the AI to be a news assistant
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful news assistant. You verify facts using the search tool. Always cite your sources with URLs."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"), # Required: This is where the agent's "thought process" goes
])
# 4. Construct the Agent
agent = create_tool_calling_agent(llm, tools, prompt)
# 5. Create the Executor (The runtime)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 6. Run it!
print("--- Agent Running ---")
response = agent_executor.invoke({
"input": "What are the latest updates on the release of GPT-5? Summarize the rumors."
})
print("\n--- Final Response ---")
print(response["output"])
Why this matters:
The
verbose=True flag is crucial here. Watch your terminal. You will see the "thought process":
Agent receives input.
Agent decides to call tavily_search_results_json.
Tavily returns data.
Agent reads data.
Agent generates the final summary with links.
Step 4: Formatting and Citing Sources
The previous step works, but sometimes the AI gets lazy with citations. Let's refine the prompt to force a specific structure. This is "Prompt Engineering" applied to Agents.
# Modified Prompt for strict formatting
prompt = ChatPromptTemplate.from_messages([
("system", """You are a rigorous research assistant.
1. Search for the latest information on the user's topic.
2. Summarize the key points in bullet points.
3. You MUST provide a 'Sources' section at the bottom with clickable URLs.
4. If you cannot find information, admit it. Do not make things up."""),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=False) # verbose=False for clean output
topic = "Current mission status of Voyager 1"
print(f"Researching: {topic}...")
response = agent_executor.invoke({"input": topic})
print(response["output"])
The Result:
You should now see a clean, professional summary followed by a list of URLs. This is a functional application you could actually use to catch up on news.
Now You Try
You have a working web-search agent. Now, extend its capabilities.
The Comparison Shopper:
Change the prompt to make the agent a "Shopping Assistant." Ask it to find the current price of a specific gadget (e.g., "Sony WH-1000XM5 headphones") at 3 different major retailers and output a table comparing prices. Hint: You might need to ask it specifically to search for 'Best Buy price', 'Amazon price', etc.
The Fact Checker:
Feed the agent a statement that might be false (e.g., "The Eiffel Tower was demolished in 2024"). Configure the system prompt to be a "Debunker." It should search for the claim and output: "TRUE", "FALSE", or "UNVERIFIED" along with evidence.
The Weekend Planner:
Ask the agent: "Find 3 music or art events happening in [Your City] this weekend." The challenge here is the concept of "time." The agent needs to know what "this weekend" means. Does Tavily figure it out, or do you need to inject today's date into the prompt? Hint: You can import
datetime and pass datetime.date.today() into the prompt string.
Challenge Project: The "Deep Dive" Research Assistant
Your goal is to build a robust CLI (Command Line Interface) tool that takes a user's question and produces a mini-report.
Requirements:
* Ask the user for a topic via
input().
* Ask the user if they want a "Quick Summary" (1 search) or a "Deep Dive" (force the agent to perform at least 2 distinct searches on different aspects of the topic).
* Print the output nicely formatted.
* Crucial Requirement: The output must verify the date. If the user asks for "news," check that the articles are actually from the current year/month.
Example Input:
> Topic: "The impact of AI on healthcare"
> Mode: Deep Dive
Example Output:
> Research Report: AI in Healthcare
>
> Key Trends:
> * Diagnostic imaging improvements...
> * Drug discovery acceleration...
>
> Recent Developments (2024-2025):
> * Google's Med-PaLM updates...
>
> Sources:
> 1. [Nature Medicine Article](http://...)
> 2. [TechCrunch Report](http://...)
Hints:
* For the "Deep Dive," you might need a loop. Or, you can prompt the agent: "Search for the general topic first, then perform a second search specifically for 'criticisms' or 'risks' of this topic."
Use standard Python print("\n" + "="30)` to make the output look like a real report.
What You Learned
Today you bridged the gap between the frozen brain of the LLM and the dynamic world of the internet.
* Tavily API: You learned that searching for AI requires clean data, not just HTML.
* AgentExecutor: You built a system that can reason ("I need info"), act (search), and observe (read results).
* Grounding: You learned that AI is more trustworthy when it cites sources.
Why This Matters:Real-world enterprise AI is almost never just a chatbot. It's a system that retrieves internal documents, searches the web for competitor info, or looks up database records. The pattern you used today—Tool Binding + Agent Execution—is the exact same architecture used in complex enterprise systems.
Tomorrow:Searching is cool, but what if the AI could actually do things? Tomorrow, we will build Code Execution Agents. We will give the AI a Python environment where it can write code, run it, debug it, and generate charts and graphs for you.