AI agents are programs that can reason, plan, and take actions autonomously. Unlike simple chatbots that just respond to prompts, agents can use tools, maintain memory, and execute multi-step workflows. In this guide, we’ll build one from scratch.
“The best way to predict the future is to invent it.” — Alan Kay
Before we start, make sure you have:
async/awaitpip and virtual environments| Tool | Version | Purpose |
|---|---|---|
| Python | 3.11+ | Runtime |
| LangChain | 0.1.x | Agent framework |
| OpenAI SDK | 1.x | LLM provider |
| FastAPI | 0.100+ | API serving |
| Redis | 7.x | Memory store |
First, create a new project and install dependencies:
mkdir ai-agent && cd ai-agent
python -m venv .venv
source .venv/bin/activate
pip install langchain openai fastapi uvicorn redis
Create the project structure:
ai-agent/
├── agent/
│ ├── __init__.py
│ ├── core.py # Agent logic
│ ├── tools.py # Custom tools
│ └── memory.py # Memory management
├── api/
│ └── server.py # FastAPI endpoints
├── tests/
│ └── test_agent.py
├── .env
└── requirements.txt
Here’s the heart of our agent — the core module that ties together the LLM, tools, and memory:
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory
class AIAgent:
"""An autonomous AI agent with tool use and memory."""
def __init__(self, model: str = "gpt-4", temperature: float = 0.1):
self.llm = ChatOpenAI(model=model, temperature=temperature)
self.memory = ConversationBufferWindowMemory(
memory_key="chat_history",
return_messages=True,
k=10 # Keep last 10 exchanges
)
self.tools = self._load_tools()
self.agent = self._build_agent()
def _load_tools(self) -> list:
"""Load all available tools for the agent."""
from agent.tools import web_search, code_executor, file_reader
return [web_search, code_executor, file_reader]
def _build_agent(self) -> AgentExecutor:
"""Construct the agent with prompt, tools, and memory."""
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful AI assistant. Use tools when needed."),
MessagesPlaceholder("chat_history"),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
])
agent = create_openai_tools_agent(self.llm, self.tools, prompt)
return AgentExecutor(
agent=agent,
tools=self.tools,
memory=self.memory,
verbose=True,
max_iterations=5,
)
async def run(self, query: str) -> str:
"""Execute the agent with a user query."""
result = await self.agent.ainvoke({"input": query})
return result["output"]
Notice a few things about this implementation:
Warning: Always set
max_iterationswhen building agents. Without it, a confused agent can enter an infinite loop, burning through your API credits.
Tools are what give agents their power. Here’s how to create a web search tool:
from langchain.tools import tool
import httpx
@tool
def web_search(query: str) -> str:
"""Search the web for current information.
Args:
query: The search query string
Returns:
Search results as formatted text
"""
async with httpx.AsyncClient() as client:
response = await client.get(
"https://api.search.example.com/search",
params={"q": query, "limit": 5}
)
results = response.json()
formatted = []
for r in results["items"]:
formatted.append(f"**{r['title']}**\n{r['snippet']}\n{r['url']}")
return "\n\n".join(formatted)
And a code execution tool with sandboxing:
@tool
def code_executor(code: str, language: str = "python") -> str:
"""Execute code in a sandboxed environment.
Args:
code: The code to execute
language: Programming language (python, javascript)
Returns:
Execution output or error message
"""
import subprocess
import tempfile
with tempfile.NamedTemporaryFile(
mode="w", suffix=f".{language}", delete=True
) as f:
f.write(code)
f.flush()
result = subprocess.run(
["python", f.name],
capture_output=True,
text=True,
timeout=30, # Kill after 30 seconds
)
if result.returncode != 0:
return f"Error:\n{result.stderr}"
return result.stdout
Here’s how the agent processes a request — visualized as a flowchart:
flowchart TD
A([User Query]) --> B[LLM Reasoning\nGPT-4 + Chat History]
B --> C{Use a tool?}
C -- No --> D([Return Response])
C -- Yes --> E[Tool Execution\nsearch, code, file read]
E --> F[Tool Result]
F --> B
Here’s the high-level architecture of our agent system:
graph LR
U([User\nAPI / CLI]) --> AC
subgraph AC[Agent Core]
PB[Prompt Builder]
TR[Tool Router]
OP[Output Parser]
IG[Iteration Guard]
end
AC --> LLM[LLM\nGPT-4 / Claude]
AC --> Tools
subgraph Tools
WS[Web Search]
CE[Code Executor]
FR[File Reader]
end
AC -.-> Mem[(Memory\nRedis / Buffer)]
Wrap the agent in a FastAPI server:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent.core import AIAgent
app = FastAPI(title="AI Agent API")
agent = AIAgent()
class QueryRequest(BaseModel):
query: str
session_id: str | None = None
class QueryResponse(BaseModel):
response: str
tools_used: list[str]
@app.post("/chat", response_model=QueryResponse)
async def chat(request: QueryRequest):
try:
result = await agent.run(request.query)
return QueryResponse(
response=result,
tools_used=agent.last_tools_used
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Run it:
uvicorn api.server:app --reload --port 8000
Test with curl:
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"query": "What is the current price of Bitcoin?"}'
Here’s how different LLM backends compare for agent tasks:
| Model | Tool Accuracy | Latency (p50) | Cost/1K tokens |
|---|---|---|---|
| GPT-4 | 95% | 2.1s | $0.03 |
| GPT-3.5 Turbo | 78% | 0.8s | $0.002 |
| Claude 3 Opus | 93% | 2.5s | $0.015 |
| Llama 3 70B | 71% | 1.2s | $0.001 |
GPT-4 remains the gold standard for complex tool use, but Claude 3 is closing the gap fast — especially for code-related tasks.
Always test agents with deterministic inputs:
import pytest
from agent.core import AIAgent
@pytest.fixture
def agent():
return AIAgent(model="gpt-3.5-turbo", temperature=0)
@pytest.mark.asyncio
async def test_basic_query(agent):
result = await agent.run("What is 2 + 2?")
assert "4" in result
@pytest.mark.asyncio
async def test_tool_use(agent):
result = await agent.run("Search for the latest Python release")
assert len(result) > 0
assert agent.last_tools_used == ["web_search"]
@pytest.mark.asyncio
async def test_memory_persistence(agent):
await agent.run("My name is Harshit")
result = await agent.run("What is my name?")
assert "Harshit" in result
Here are mistakes I’ve made so you don’t have to:
# ❌ Bad: No error handling
@tool
def dangerous_tool(input: str) -> str:
return requests.get(input).text
# ✅ Good: Defensive tool design
@tool
def safe_tool(input: str) -> str:
"""Fetch content from a URL safely."""
try:
response = requests.get(input, timeout=10)
response.raise_for_status()
return response.text[:5000] # Limit output size
except requests.RequestException as e:
return f"Error fetching URL: {e}"
In the next post, we’ll cover:
Found this useful? Subscribe to my newsletter for more AI engineering tutorials.
Get notified when I publish new posts. No spam, unsubscribe anytime.