How Retell AI Scaled Monitoring from 1 to 1M+ Hourly LLM Calls


Company	Retell AI
Industry	Voice AI
Use case	Voice AI phone agents
Website	retellai.com

LLM Logs / Hour	Time Saved
1M+	90%

About Retell AI

Retell AI is a voice AI platform that enables businesses to build, test, and deploy human-like AI phone agents. Their agents handle everything from customer support to appointment scheduling, processing millions of phone calls daily. Each call generates dozens of LLM requests in real time.

The challenge

Running AI phone agents at scale creates unique observability challenges. Every phone call involves multiple LLM generations, from understanding caller intent, to generating responses, to detecting scams, to managing conversation flow. Retell AI needed to monitor millions of LLM calls across thousands of concurrent phone conversations, track each call end-to-end by linking every generation back to the call and agent that triggered it, export large volumes of logs by custom properties for training and evaluation, and get real-time warnings for failures like invalid JSON, stream timeouts, and fallback triggers.

Why Respan

"We evaluated several observability platforms, but Respan was the only one that could reliably support large-scale exports of our production logs. The combination of rich metadata indexing and batch exports means we can pull huge, filtered datasets (by language, agent, or use case) for evals and training without building a custom pipeline."

xxx, Retell AI

How Retell AI uses Respan

Async logging at massive scale

Retell AI processes 1M+ LLM logs every hour through Respan's async logging API. Every individual LLM call, whether it's an intent classifier, a response generator, or a scam detector, is logged as a standalone request, giving the team granular visibility into each step of their pipeline.

Async logging was the right fit for Retell AI's architecture: simple per-call logging without the overhead of tracing instrumentation, while still capturing the full picture through thread grouping.

Rich metadata and custom properties

Each log is tagged with structured metadata that maps directly into Retell AI's domain, using both native Respan (Keywords AI) fields and custom customer properties.

Property	Purpose
`customer_identifier` (native)	End-user or account identifier
`thread_identifier` (native)	Groups all LLM generations within a single phone call
`call_id` (custom property)	Unique identifier for each phone call
`agent_id` (custom property)	Which Retell AI agent handled the call
`language` (custom property)	Language of the conversation
`use_case` (custom property)	Scam detection, call analytics, node transition, turn-taking

Respan's flexible customer property index means all of these fields are fully searchable and filterable, with no waiting for indexing and no field limits.

This makes it easy for Retell AI to slice their traffic in ways that match how they operate. They can filter by use case (scam detection, call analytics response, node transition, turn-taking), by agent using agent_id, by a specific phone call using call_id, and by language to evaluate model quality across locales.

Threads for full-call visibility

A single phone call can generate dozens of LLM requests. Without grouping, it's impossible to understand what actually happened during a conversation.

By using thread_identifier mapped to each call_id, Retell AI can view the complete sequence of LLM generations for any phone call, seeing the full conversation flow from greeting to resolution in a single thread view.

Batch exports for evals and model training

Retell AI regularly exports filtered log batches from Respan to feed their evaluation and model training pipelines. For example, they can export all logs for a specific language (set via custom property) to evaluate model performance for that language or fine-tune models on real production data.

Dashboard + reliability monitoring

Retell AI has built a monitoring dashboard tailored to their voice AI operations, tracking metrics across agents, languages, use cases, and call outcomes in real time.

Retell AI now processes 1M+ LLM logs per hour via async logging and gets full call visibility by grouping all generations per phone call with threads. They also report 90% time saved when exporting logs and preparing datasets for training and evals, and they use a unified view across agents, languages, use cases, and call outcomes to keep production reliable at scale. Warnings for fallbacks, retries, invalid JSON, and timeouts help them catch issues before they impact a large number of conversations.

At 1M+ logs per hour, even a small error rate means thousands of affected conversations. Retell AI monitors Respan warnings to catch:

fallback triggers (when a primary model fails and traffic routes to a backup), retry events on transient failures, invalid JSON from structured outputs, and stream timeouts when responses stall or disconnect.

Results

"Respan's customer support has been exceptional. For a fast-growing startup, having a team that responds immediately and helps us fix issues quickly, often right away, makes a huge difference in keeping production stable while we move fast."

Retell AI engineering team

Future plans

Retell AI plans to go deeper on exports and operational monitoring as they scale. Their next milestones include 10x faster exports for large, filtered log datasets used in evals and training, and more tailored dashboards to track agent performance, languages, and call outcomes in real time.


Company	Retell AI
Industry	Voice AI
Use case	Voice AI phone agents
Website	retellai.com

LLM Logs / Hour	Time Saved
1M+	90%

About Retell AI

The challenge

Why Respan

"We evaluated several observability platforms, but Respan was the only one that could reliably support large-scale exports of our production logs. The combination of rich metadata indexing and batch exports means we can pull huge, filtered datasets (by language, agent, or use case) for evals and training without building a custom pipeline."

xxx, Retell AI