For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DiscordPlatform
DocumentationIntegrationsAPI referenceSDKsChangelog
DocumentationIntegrationsAPI referenceSDKsChangelog
  • API Reference
      • POSTCreate embeddings
      • POSTSpeech to text
      • POSTText to speech
      • GETRetrieve AssemblyAI transcript
  • Reference
    • Filters API Reference
LogoLogo
DiscordPlatform
API ReferenceMultimodal

Speech to text

POST
https://api.respan.ai/api/audio/transcriptions
POST
/api/audio/transcriptions
$curl -X POST https://api.respan.ai/api/audio/transcriptions \
> -H "Authorization: Bearer sk_live_xxxxx" \
> -H "Content-Type: multipart/form-data" \
> -F file=@meeting_recording.wav \
> -F model="whisper-1"
1{
2 "text": "Good morning everyone, let's start the weekly team meeting.",
3 "language": "en",
4 "duration": 12.5,
5 "words": [
6 {
7 "word": "Good",
8 "start": 0,
9 "end": 0.3
10 },
11 {
12 "word": "morning",
13 "start": 0.3,
14 "end": 0.8
15 },
16 {
17 "word": "everyone,",
18 "start": 0.8,
19 "end": 1.3
20 },
21 {
22 "word": "let's",
23 "start": 1.3,
24 "end": 1.6
25 },
26 {
27 "word": "start",
28 "start": 1.6,
29 "end": 2
30 },
31 {
32 "word": "the",
33 "start": 2,
34 "end": 2.2
35 },
36 {
37 "word": "weekly",
38 "start": 2.2,
39 "end": 2.7
40 },
41 {
42 "word": "team",
43 "start": 2.7,
44 "end": 3
45 },
46 {
47 "word": "meeting.",
48 "start": 3,
49 "end": 3.5
50 }
51 ],
52 "segments": [
53 {}
54 ]
55}
Transcribe audio to text through the Respan gateway with automatic logging.
Was this page helpful?
Previous

Text to speech

Next
Built with

Headers

AuthorizationstringRequired

Bearer token. Use Bearer YOUR_API_KEY.

X-Data-Respan-ParamsstringOptional

Base64-encoded JSON object of Respan parameters. Legacy X-Data-Keywordsai-Params is still accepted.

Request

This endpoint expects a multipart form containing a file.
filefileRequired

Audio file. Supported: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

modelenumRequired
Model ID.
Allowed values:
languagestringOptional

Input audio language (ISO-639-1).

promptstringOptional
Optional text to guide the model's style.
response_formatenumOptionalDefaults to json
Output format.
Allowed values:
temperaturedoubleOptional

Sampling temperature (0-1).

timestamp_granularitiesenumOptional

Timestamp granularities. Requires verbose_json response format.

Allowed values:
customer_credentialsobjectOptional

Per-customer LLM provider credentials.

disable_logbooleanOptionalDefaults to false

When true, omits input/output from the log. Metrics still recorded.

metadataobjectOptional

Custom key-value metadata.

customer_identifierstringOptional
End user identifier.
thread_identifierstringOptional
Conversation thread ID.

Response

Transcription result.
textstring
Transcribed text.
languagestring
Detected language.
durationdouble
Audio duration in seconds.
wordslist of objects

Word-level timestamps (if requested).

segmentslist of objects

Segment-level timestamps (if requested).

Errors

401
Unauthorized Error