Audio Transcription

Transcribe audio into text. Supports file upload or URL.

Endpoint

POST https://api.leaper.one/v1/audio/transcriptions

Models

Model	Pricing	Description
`rapid`	0.006 credits/min	Fast transcription, supports file_uri
`whisper-1`	0.006 credits/min	High accuracy with prompt support

Parameters

Parameter	Type	Required	Description
file	file	Either `file` or `file_uri`	Audio file to transcribe (multipart upload)
file_uri	string	Either `file` or `file_uri`	URL of the audio file to transcribe
model	string	No	Model to use: `"rapid"` or `"whisper-1"`. Default: `"rapid"`
response_format	string	No	Output format: `"text"`, `"json"`, `"verbose_json"`, `"srt"`, or `"vtt"`. Default: `"json"`
language	string	No	ISO 639-1 language code (e.g. `"en"`, `"zh"`). Improves accuracy if specified
prompt	string	No	Hint text to improve transcription quality
temperature	number	No	Sampling temperature between 0 and 1 (`whisper-1` only)

Request

Upload a file

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=rapid \
  -F response_format=json

Pass a URL (recommended for large files)

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file_uri=https://example.com/audio.mp3 \
  -F model=rapid \
  -F language=zh \
  -F response_format=verbose_json

Using file_uri avoids uploading large files through your network. The audio is fetched server-side. Supported with both rapid and whisper-1 models.

Using whisper-1 with prompt

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=whisper-1 \
  -F response_format=json \
  -F prompt="LEAPERone, GTC, NVIDIA"

Requesting subtitle output

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=whisper-1 \
  -F response_format=srt

Response

{
  "text": "Hello, this is a sample transcription of the audio file."
}

When using verbose_json, the response includes timestamps and segments:

{
  "task": "transcribe",
  "language": "en",
  "duration": 42.5,
  "text": "Hello, this is a sample transcription of the audio file.",
  "segments": [
    { "start": 0.0, "end": 2.4, "text": "Hello, this is a sample transcription." }
  ]
}

When using srt, the response is plain text subtitle output:

1
00:00:00,000 --> 00:00:02,400
Hello, this is a sample transcription.

When using vtt, the response is returned as WebVTT:

WEBVTT

00:00:00.000 --> 00:00:02.400
Hello, this is a sample transcription.

Supported Audio Formats

mp3, mp4, mpeg, mpga, m4a, wav, webm, opus

File uploads are limited to 25 MB. When using file_uri, there is no size limit.

Notes

Billing is based on audio duration, charged at the per-model rate listed above.
If no model is specified, rapid is used by default.
Streaming (stream=true) is not supported at this time.

On this page