Building AI Agents with Python: A Practical Guide

AI agents are programs that can reason, plan, and take actions autonomously. Unlike simple chatbots that just respond to prompts, agents can use tools, maintain memory, and execute multi-step workflows. In this guide, we’ll build one from scratch.

Abstract visualization of an AI neural network with glowing nodes and connections

“The best way to predict the future is to invent it.” — Alan Kay

What You’ll Learn

What AI agents are and how they differ from chatbots
Setting up LangChain with OpenAI
Building a tool-using agent
Adding memory and context
Deployment considerations

Prerequisites

Before we start, make sure you have:

Python 3.11+ installed
An OpenAI API key
Basic understanding of async/await
Familiarity with pip and virtual environments

Tool	Version	Purpose
Python	3.11+	Runtime
LangChain	0.1.x	Agent framework
OpenAI SDK	1.x	LLM provider
FastAPI	0.100+	API serving
Redis	7.x	Memory store

Setting Up the Project

First, create a new project and install dependencies:

mkdir ai-agent && cd ai-agent
python -m venv .venv
source .venv/bin/activate
pip install langchain openai fastapi uvicorn redis

Create the project structure:

ai-agent/
├── agent/
│   ├── __init__.py
│   ├── core.py          # Agent logic
│   ├── tools.py         # Custom tools
│   └── memory.py        # Memory management
├── api/
│   └── server.py        # FastAPI endpoints
├── tests/
│   └── test_agent.py
├── .env
└── requirements.txt

Building the Agent Core

Here’s the heart of our agent — the core module that ties together the LLM, tools, and memory:

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory

class AIAgent:
    """An autonomous AI agent with tool use and memory."""

    def __init__(self, model: str = "gpt-4", temperature: float = 0.1):
        self.llm = ChatOpenAI(model=model, temperature=temperature)
        self.memory = ConversationBufferWindowMemory(
            memory_key="chat_history",
            return_messages=True,
            k=10  # Keep last 10 exchanges
        )
        self.tools = self._load_tools()
        self.agent = self._build_agent()

    def _load_tools(self) -> list:
        """Load all available tools for the agent."""
        from agent.tools import web_search, code_executor, file_reader
        return [web_search, code_executor, file_reader]

    def _build_agent(self) -> AgentExecutor:
        """Construct the agent with prompt, tools, and memory."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a helpful AI assistant. Use tools when needed."),
            MessagesPlaceholder("chat_history"),
            ("human", "{input}"),
            MessagesPlaceholder("agent_scratchpad"),
        ])

        agent = create_openai_tools_agent(self.llm, self.tools, prompt)

        return AgentExecutor(
            agent=agent,
            tools=self.tools,
            memory=self.memory,
            verbose=True,
            max_iterations=5,
        )

    async def run(self, query: str) -> str:
        """Execute the agent with a user query."""
        result = await self.agent.ainvoke({"input": query})
        return result["output"]

Key Design Decisions

Notice a few things about this implementation:

Temperature 0.1 — We want consistent, reliable outputs for tool use
Window memory (k=10) — Prevents context overflow while maintaining conversation flow
Max iterations = 5 — Safety limit to prevent infinite tool-calling loops

Warning: Always set max_iterations when building agents. Without it, a confused agent can enter an infinite loop, burning through your API credits.

Creating Custom Tools

Tools are what give agents their power. Here’s how to create a web search tool:

from langchain.tools import tool
import httpx

@tool
def web_search(query: str) -> str:
    """Search the web for current information.

    Args:
        query: The search query string

    Returns:
        Search results as formatted text
    """
    async with httpx.AsyncClient() as client:
        response = await client.get(
            "https://api.search.example.com/search",
            params={"q": query, "limit": 5}
        )
        results = response.json()

    formatted = []
    for r in results["items"]:
        formatted.append(f"**{r['title']}**\n{r['snippet']}\n{r['url']}")

    return "\n\n".join(formatted)

And a code execution tool with sandboxing:

@tool
def code_executor(code: str, language: str = "python") -> str:
    """Execute code in a sandboxed environment.

    Args:
        code: The code to execute
        language: Programming language (python, javascript)

    Returns:
        Execution output or error message
    """
    import subprocess
    import tempfile

    with tempfile.NamedTemporaryFile(
        mode="w", suffix=f".{language}", delete=True
    ) as f:
        f.write(code)
        f.flush()

        result = subprocess.run(
            ["python", f.name],
            capture_output=True,
            text=True,
            timeout=30,  # Kill after 30 seconds
        )

    if result.returncode != 0:
        return f"Error:\n{result.stderr}"

    return result.stdout

The Agent Loop Explained

Here’s how the agent processes a request — visualized as a flowchart:

flowchart TD
    A([User Query]) --> B[LLM Reasoning\nGPT-4 + Chat History]
    B --> C{Use a tool?}
    C -- No --> D([Return Response])
    C -- Yes --> E[Tool Execution\nsearch, code, file read]
    E --> F[Tool Result]
    F --> B

Python code on a dark IDE with AI-related imports and function definitions

Agent Architecture Overview

Here’s the high-level architecture of our agent system:

graph LR
    U([User\nAPI / CLI]) --> AC

    subgraph AC[Agent Core]
        PB[Prompt Builder]
        TR[Tool Router]
        OP[Output Parser]
        IG[Iteration Guard]
    end

    AC --> LLM[LLM\nGPT-4 / Claude]
    AC --> Tools

    subgraph Tools
        WS[Web Search]
        CE[Code Executor]
        FR[File Reader]
    end

    AC -.-> Mem[(Memory\nRedis / Buffer)]

Adding the API Layer

Wrap the agent in a FastAPI server:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from agent.core import AIAgent

app = FastAPI(title="AI Agent API")
agent = AIAgent()

class QueryRequest(BaseModel):
    query: str
    session_id: str | None = None

class QueryResponse(BaseModel):
    response: str
    tools_used: list[str]

@app.post("/chat", response_model=QueryResponse)
async def chat(request: QueryRequest):
    try:
        result = await agent.run(request.query)
        return QueryResponse(
            response=result,
            tools_used=agent.last_tools_used
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Run it:

uvicorn api.server:app --reload --port 8000

Test with curl:

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the current price of Bitcoin?"}'

Performance Comparison

Here’s how different LLM backends compare for agent tasks:

Model	Tool Accuracy	Latency (p50)	Cost/1K tokens
GPT-4	95%	2.1s	$0.03
GPT-3.5 Turbo	78%	0.8s	$0.002
Claude 3 Opus	93%	2.5s	$0.015
Llama 3 70B	71%	1.2s	$0.001

GPT-4 remains the gold standard for complex tool use, but Claude 3 is closing the gap fast — especially for code-related tasks.

Testing Your Agent

Always test agents with deterministic inputs:

import pytest
from agent.core import AIAgent

@pytest.fixture
def agent():
    return AIAgent(model="gpt-3.5-turbo", temperature=0)

@pytest.mark.asyncio
async def test_basic_query(agent):
    result = await agent.run("What is 2 + 2?")
    assert "4" in result

@pytest.mark.asyncio
async def test_tool_use(agent):
    result = await agent.run("Search for the latest Python release")
    assert len(result) > 0
    assert agent.last_tools_used == ["web_search"]

@pytest.mark.asyncio
async def test_memory_persistence(agent):
    await agent.run("My name is Harshit")
    result = await agent.run("What is my name?")
    assert "Harshit" in result

Common Pitfalls

Here are mistakes I’ve made so you don’t have to:

Not setting token limits — Agents can generate massive prompts
Skipping tool descriptions — The LLM reads these to decide when to use tools
No error handling in tools — A crashing tool breaks the entire agent loop
Trusting agent output blindly — Always validate before executing actions

The Fix Pattern

# ❌ Bad: No error handling
@tool
def dangerous_tool(input: str) -> str:
    return requests.get(input).text

# ✅ Good: Defensive tool design
@tool
def safe_tool(input: str) -> str:
    """Fetch content from a URL safely."""
    try:
        response = requests.get(input, timeout=10)
        response.raise_for_status()
        return response.text[:5000]  # Limit output size
    except requests.RequestException as e:
        return f"Error fetching URL: {e}"

A futuristic server room with blue lighting representing AI infrastructure

What’s Next?

In the next post, we’ll cover:

RAG (Retrieval-Augmented Generation) — Giving agents access to your documents
Multi-agent systems — Agents that collaborate with each other
Production deployment — Docker, monitoring, and scaling

Found this useful? Subscribe to my newsletter for more AI engineering tutorials.