The Model Context Protocol shipped in November 2024 and is now the de-facto standard for connecting Claude, ChatGPT, Cursor, and every other agentic client to your data and tools. Every "MCP server tutorial" online stops at the hello-world tool call. add(a, b) returns a + b and the author calls it done.
That's the easy 10%. The other 90% is what separates a demo from production: auth, structured errors, resources versus tools, deploying as a remote server, and the tracing setup you need before a Claude agent loops on it 200 times in an afternoon. This tutorial covers the whole arc in Python, with code you can copy and a working reference server at the end.
TL;DR
- An MCP server is a process that exposes tools (functions an agent can call), resources (read-only context an agent can fetch), and prompts (templated workflows the user picks).
- For local clients, use stdio transport. For remote or multi-tenant, use HTTP + SSE (Streamable HTTP).
- The official Python SDK is
mcp(pip install mcp). FastMCP is the high-level API. The low-level API gives you more control over the JSON-RPC layer. - Three things every production server needs that tutorials skip: structured error responses, authentication for remote servers, and OpenTelemetry tracing.
- Agents call MCP servers in loops. Without tracing, debugging a misbehaving agent loop is a guessing game.
What MCP actually is
MCP is a JSON-RPC 2.0 protocol over either stdio (local processes) or Streamable HTTP (remote servers). The client (Claude Desktop, Cursor, an Agent SDK) connects, asks the server "what do you expose?", and gets back three lists:
- Tools. Functions the LLM can call with arguments. Side-effectful is fine.
- Resources. Addressable read-only context. URIs like
file:///pathorpostgres://table/123. - Prompts. Parameterized prompt templates the user (not the LLM) selects.
This three-way split matters because it maps to how the LLM uses each. The model decides which tools to call. The model can fetch resources when context allows it. Users invoke prompts explicitly. Think slash commands in Claude Desktop.
Most tutorials only cover tools because tools are the most common. Production servers use all three.
Setup
You need Python 3.10+ and the official SDK:
pip install mcp anthropicOptional but useful:
pip install httpx pydantic # for HTTP-based tools
pip install opentelemetry-api opentelemetry-sdk # for tracingCreate a project:
mkdir mcp-server && cd mcp-server
touch server.pyA minimal server with two tools
Here is a server that exposes two tools: a weather lookup and a database query. This is roughly where every tutorial stops. We will layer the production bits on top after.
# server.py
from mcp.server.fastmcp import FastMCP
import httpx
mcp = FastMCP("demo-server")
@mcp.tool()
async def get_weather(city: str) -> str:
"""Get the current weather for a city.
Args:
city: City name, e.g. "San Francisco"
"""
async with httpx.AsyncClient() as client:
r = await client.get(
f"https://wttr.in/{city}",
params={"format": "j1"},
timeout=5.0,
)
r.raise_for_status()
data = r.json()
current = data["current_condition"][0]
return f"{city}: {current['temp_C']}°C, {current['weatherDesc'][0]['value']}"
@mcp.tool()
async def query_users(min_signups: int = 10) -> list[dict]:
"""List users with at least min_signups completed signups.
Args:
min_signups: Minimum signups (default 10)
"""
# In real code, hit your DB. Returning fixtures here.
return [
{"id": 1, "name": "alice", "signups": 14},
{"id": 2, "name": "bob", "signups": 22},
]
if __name__ == "__main__":
mcp.run() # default: stdio transportThat is a complete MCP server. Docstrings become the descriptions the LLM sees. Type hints become the JSON schema. To test locally with Claude Desktop, add this to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"demo": {
"command": "python",
"args": ["/absolute/path/to/server.py"]
}
}
}Restart Claude, ask "what's the weather in Tokyo?", and you will see Claude call your tool.
Adding a resource
Resources are addressable, read-only context. Use them for anything the LLM might want to read but shouldn't have to "ask for" with a tool call. Common cases: configuration files, document corpora, schemas, current state.
@mcp.resource("schema://users")
async def users_schema() -> str:
"""The schema of the users table."""
return """
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
signups INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT now()
);
"""The client sees schema://users as an available resource. When the user (or model, on supporting clients) chooses to attach it, the function is called and the return value becomes context. The benefit over a tool: a resource is passive. The model doesn't have to decide to call it. You can attach it to every conversation and the model can use it without thinking.
Adding a prompt
Prompts are user-triggered workflows. They give your users one-click access to common multi-step operations.
@mcp.prompt()
def signup_audit(date: str = "today") -> str:
"""Audit signups for a given date.
Args:
date: Date string (default "today")
"""
return f"""Please audit signups for {date}. Steps:
1. Use query_users with min_signups=0 to get all signups today
2. Identify any with suspicious patterns (rapid succession, similar names)
3. Cross-reference against the users schema for completeness
4. Report findings as a markdown table."""In Claude Desktop this shows up as a slash command. The user picks it. The templated prompt is sent to the model.
Structured errors (the part tutorials skip)
When a tool fails, you have two ways to respond. The naive way:
@mcp.tool()
async def buy_credits(amount: int) -> str:
if amount > 1000:
raise ValueError("Amount too large")
# ...The model gets a generic protocol error. It often retries blindly or gives up. The production pattern is to return a structured result the model can reason about:
@mcp.tool()
async def buy_credits(amount: int) -> dict:
if amount > 1000:
return {
"error": "amount_too_large",
"message": "Amount must be 1000 or less",
"max_allowed": 1000,
"retry": False,
}
# ...
return {"ok": True, "credits": amount, "balance": new_balance}The model reads retry: False and stops trying. It reads max_allowed: 1000 and can suggest a smaller amount to the user. Structured errors are the single biggest reliability win in MCP server design and almost no tutorial covers them. If you change one thing after reading this, change your error returns.
Going remote: Streamable HTTP transport
Stdio is fine for local clients where the user installs your server as a binary. For multi-user services (a public MCP server, one shared across a team, a hosted product offering) you want HTTP. The SDK supports it out of the box:
# server.py: same tools, different transport
if __name__ == "__main__":
mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)That is it. You now have an HTTP MCP endpoint at http://your-host:8000/mcp. Cursor and Claude both support remote MCP servers via this protocol.
Authentication
A public remote MCP server with no auth is a security incident waiting to happen. The MCP spec covers OAuth 2.1 in detail and the current revision is what you should follow. For an internal server, a header-based bearer token is acceptable.
A minimal bearer-token check:
from mcp.server.fastmcp import FastMCP, Context
import os
mcp = FastMCP("demo-server")
EXPECTED_TOKEN = os.environ["MCP_TOKEN"]
@mcp.tool()
async def query_users(ctx: Context, min_signups: int = 10) -> dict:
auth = ctx.request_context.request.headers.get("authorization", "")
if auth != f"Bearer {EXPECTED_TOKEN}":
return {"error": "unauthorized", "retry": False}
# ... rest of implementationFor production, push auth up to a reverse proxy or use the SDK's OAuth integration. Putting it in every tool is repetitive and easy to forget on the next tool you add.
Tracing: why agent loops break without it
Here is the failure mode every MCP server hits in production. A Claude agent calls your query_users tool 14 times in a row, takes wildly different actions each time, and the user reports the agent is "going in circles." Now you need to debug it. You have no visibility into:
- What prompt the agent was running
- What arguments it called your tool with on each iteration
- Whether your tool returned the same answer twice
- Where in the loop a retryable error escalated to a hard fail
Without tracing, every debug session starts from scratch. You ask the user to reproduce the issue. You add print statements. You read logs across three systems. With OpenTelemetry plus an MCP-aware observability backend, every tool call shows up as a span attached to the parent agent run, with arguments, response, latency, and errors. You read the trace tree and the failure mode is right there.
A minimal trace setup:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
BatchSpanProcessor(ConsoleSpanExporter())
)
tracer = trace.get_tracer("mcp-server")
@mcp.tool()
async def query_users(min_signups: int = 10) -> dict:
with tracer.start_as_current_span("query_users") as span:
span.set_attribute("mcp.tool", "query_users")
span.set_attribute("mcp.args.min_signups", min_signups)
result = await _do_query(min_signups)
span.set_attribute("mcp.result.row_count", len(result))
return {"users": result}In production, swap ConsoleSpanExporter for an OTLP exporter pointed at your observability backend. Respan supports MCP tracing via its MCP integration, which uses openinference-instrumentation-mcp to auto-capture tool invocations, resource access, and server-client communication as spans. Runnable Python and TypeScript examples live in the respan-example-projects repo. See LLM observability for the wider picture.
Testing your server
The MCP Inspector is the official testing UI. Install:
npx @modelcontextprotocol/inspector python server.pyIt opens a web UI where you can see the tools, resources, and prompts your server exposes, call them with arguments, and inspect raw JSON-RPC traffic. Use it before connecting Claude or Cursor. It catches most bugs faster than live testing.
Common production gotchas
A short list of mistakes we have watched teams make:
- Returning unstructured strings instead of dicts. The model can't reason about an opaque blob. Structured JSON with explicit
errorandretryflags is the standard. - No timeouts on outbound calls. A slow API call inside a tool stalls the agent loop. The default
httpxclient has no timeout. Set one. - Tool descriptions the model can't use. "Internal endpoint, see wiki" tells the model nothing. Describe the tool, when to use it, and what it returns.
- One giant tool that does ten things. Better to have five small tools the model can compose than one swiss-army tool with a complex argument schema.
- No rate limiting on remote servers. An agent loop with no rate limit will hit your server 50 times per minute. Limit per session.
- Skipping tracing. Covered above. Debugging without traces is the difference between a 10-minute fix and a multi-hour incident.
What's next
- What is agentic RAG?. When an agent uses MCP servers as retrieval, the design changes.
- LLM tracing. The broader picture of what to capture per call.
- Agent frameworks explained. How LangGraph, CrewAI, and OpenAI Agents handle MCP integrations.
- LLM observability. The full operational picture beyond MCP.
FAQ
Do I need TypeScript or can I just use Python?
Python is fine. The official mcp SDK is full-featured. TypeScript is more common in client-side examples but the server-side API parity is good.
Can I use MCP with OpenAI or GPT models? Yes. The protocol is client-agnostic. ChatGPT, the OpenAI Agents SDK, and the OpenAI Apps SDK all support MCP servers. Most other major agent runtimes do too.
Should I use FastMCP or the low-level Server API? FastMCP for anything that fits its decorators. Drop to the low-level Server when you need custom transport, complex initialization, or non-standard JSON-RPC handling.
How do I handle long-running tools?
For anything over a few seconds, return a job ID immediately and add a check_job_status tool. Agent loops have implicit time budgets. A 30-second tool call is a bad citizen.
Is stdio or HTTP the right transport? Stdio if your server is a binary the user installs locally (Claude Desktop's default model). HTTP if it is a hosted service multiple clients connect to.
How do I authenticate an MCP server? For local stdio, the OS process boundary is the auth. For remote HTTP, OAuth 2.1 or bearer tokens. The MCP spec covers the OAuth dance in detail.
Where do I host a remote MCP server? Any platform that supports long-lived HTTP connections. Cloudflare Workers, Fly.io, Render, AWS Lambda with response streaming. SSE-based transport needs a platform that doesn't aggressively close idle connections.