About Retell AI
Retell AI is a voice AI platform that lets businesses build, test, and deploy human-like AI phone agents. Their agents handle customer support, appointment scheduling, outbound calls, and more, processing millions of phone calls daily across dozens of languages. Every call generates dozens of LLM requests in real time, with sub-second latency budgets and no margin for dropped traffic.
Before Respan: a phone call you couldn't actually replay
A phone call on Retell is not one inference. Inside a single conversation, the stack runs an intent classifier on every user turn, a streaming response generator, a scam detector running in parallel, a node-transition model deciding where the conversation goes next, and a turn-taking model deciding when to speak.
That produced a hard problem the team had to solve every day before Respan.
Reconstructing a single conversation took forensic work. A customer would report that a call went sideways. Engineers had to pull logs from the intent service, the response service, the scam detector, and the orchestration layer, then line them up by timestamp to figure out what actually happened. The full conversation lived across multiple subsystems, and rebuilding it ate hours every time.
Exports for evals required a custom pipeline. Retell's models improve on real production data, segmented by language, agent, and use case. Pulling those filtered datasets out of a generic logging stack meant building and maintaining ETL the team didn't want to own.
Real-time signals got lost in the noise. Fallback triggers, retries, invalid JSON from structured outputs, stream timeouts. At voice scale, each of these can break a live conversation, and a daily log review is too late.
Why Respan over the alternatives
Retell evaluated several observability platforms before switching. Three things tipped the decision.
The only platform that handled production-scale exports. Retell's evals and training depend on pulling huge filtered datasets out of production. Respan was the only platform Retell tested that could do it reliably, with rich metadata indexing and batch exports built in. No custom ETL, no sampling, no waiting on indexing.
Async logging that fit the architecture. Voice agents already orchestrate their own pipelines and need every millisecond of latency budget. Respan's async logging API lets Retell log every LLM call as a standalone request, without the overhead of a heavy tracing SDK, and still reconstruct the full conversation through threads.
A team that ships with them. Respan's support response time is measured in minutes, not days. For a voice AI startup running production at high concurrency, that operational partnership is part of how the product stays online.
"We evaluated several observability platforms, but Respan was the only one that could reliably support large-scale exports of our production logs. The combination of rich metadata indexing and batch exports means we can pull huge, filtered datasets, by language, agent, or use case, for evals and training without building a custom pipeline."
Zexia Zhang, CTO of Retell AI
A day in the life of Retell on Respan
The clearest way to describe what changed is to walk through a routine moment.
A stream timeout alert fires. A handful of calls have stalled mid-response in the last few minutes. The on-call engineer opens Respan, filters by the use_case of the affected step, and pulls up the thread for one of the impacted call_ids. The full conversation is in front of them: greeting, intent, response, scam detection in parallel, the turn where the stream stalled. The metadata says one provider's streaming endpoint is slow on a specific model. The engineer flips that model to a fallback in the gateway, watches the retry rate fall on the dashboard, and clears the alert. No log triangulation. No partial reconstruction. No customer report needed to start the investigation.
That same thread, exported alongside others tagged with the same language, becomes part of next week's eval set.
How Retell uses Respan
Async logging at 1M+ requests per hour
Retell sends every LLM call (intent classification, response generation, scam detection, node transition, turn-taking) through Respan's async logging API as a standalone request. Granular per-call visibility, no added latency in the real-time voice path, no heavy tracing instrumentation to maintain.
Custom properties for the way the business actually thinks
Each log is tagged with structured metadata that maps directly into Retell's domain.
| Property | Purpose |
|---|---|
customer_identifier (native) | End-user or account identifier |
thread_identifier (native) | Groups all LLM generations within a single phone call |
call_id (custom property) | Unique identifier for each phone call |
agent_id (custom property) | Which Retell AI agent handled the call |
language (custom property) | Language of the conversation |
use_case (custom property) | Scam detection, call analytics, node transition, turn-taking |
Respan's flexible customer property index makes every field searchable and filterable, with no indexing wait and no field limits. That lets Retell slice traffic by use case, by agent, by call_id, or by language to evaluate model quality across locales before rolling out a model change.
Threads for full-call visibility
By mapping thread_identifier to each call_id, Retell pulls up the complete sequence of LLM generations for any phone call in Respan in a single thread view. What used to be a forensic exercise across multiple services is one click.
Batch exports for evals and model training
Retell regularly exports filtered log batches from Respan to feed evaluation and training pipelines. Typical workflows include exporting every Spanish-language conversation from the last 30 days for a locale-specific eval, exporting every call where the scam detector fired to fine-tune the next version, and exporting every call where a fallback was triggered to review what the primary model missed. What used to require a custom ETL pipeline is now a filter and a download.
Real-time reliability monitoring
Retell uses Respan's monitoring layer to track fallback triggers, retry events, invalid JSON from structured outputs, and stream timeouts in real time. Each is wired into a Retell-tailored dashboard and into alerts that fire the moment something drifts, so issues get caught before they spread across thousands of concurrent calls.
Custom dashboards for voice operations
Retell built a monitoring dashboard inside Respan tailored to voice operations, tracking quality, latency, and fallback behavior across agents, languages, use cases, and call outcomes in real time. One view that mirrors how the team thinks about the product, instead of stitched-together graphs from multiple tools.
Results
Retell product impact
- 1M+ LLM logs per hour ingested via async logging without sampling.
- Real-time visibility across agents, languages, use cases, and call outcomes from one dashboard.
- Eval and training datasets pulled directly from production traffic.
What Respan unlocks for Retell day to day
- ~90% time saved preparing filtered log exports for evals and training.
- A whole phone call viewable as one thread, instead of triangulated across services.
- Real-time alerts on fallbacks, retries, invalid JSON, and stream timeouts before they reach customers.
- A unified observability surface that grew with traffic instead of being replaced at every scale jump.
"Respan's customer support has been exceptional. For a fast-growing startup, having a team that responds immediately and helps us fix issues quickly, often right away, makes a huge difference in keeping production stable while we move fast."
The Retell AI team
What this means for Retell's customers
Retell's customers run high-stakes voice workflows: scheduling, support, outbound, payments. They care about call reliability, language coverage, and being able to look back at any conversation and explain what happened. Respan gives Retell the per-call thread view, the metadata-indexed search, and the export pipeline behind the answers their customers ask for, from "why did this call drop" to "show me every Spanish-language call where the agent escalated."
What's next
Retell AI plans to go deeper on exports and operational monitoring as they scale. Their next milestones include 10x faster exports for large, filtered log datasets used in evals and training, more tailored dashboards to track agent performance, languages, and call outcomes in real time, and tighter eval loops between production traces and the next generation of voice models.



