LEAPERone Docs

Audio Transcription

POST /v1/audio/transcriptions

Transcribe audio into text. Supports file upload or URL.

Endpoint

POST https://api.leaper.one/v1/audio/transcriptions

Models

ModelPricingDescription
rapid0.006 credits/minFast transcription, supports file_uri
whisper-10.006 credits/minHigh accuracy with prompt support

Parameters

ParameterTypeRequiredDescription
filefileEither file or file_uriAudio file to transcribe (multipart upload)
file_uristringEither file or file_uriURL of the audio file to transcribe
modelstringNoModel to use: "rapid" or "whisper-1". Default: "rapid"
response_formatstringNoOutput format: "text", "json", "verbose_json", "srt", or "vtt". Default: "json"
languagestringNoISO 639-1 language code (e.g. "en", "zh"). Improves accuracy if specified
promptstringNoHint text to improve transcription quality
temperaturenumberNoSampling temperature between 0 and 1 (whisper-1 only)

Request

Upload a file

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=rapid \
  -F response_format=json
curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file_uri=https://example.com/audio.mp3 \
  -F model=rapid \
  -F language=zh \
  -F response_format=verbose_json

Using file_uri avoids uploading large files through your network. The audio is fetched server-side. Supported with both rapid and whisper-1 models.

Using whisper-1 with prompt

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=whisper-1 \
  -F response_format=json \
  -F prompt="LEAPERone, GTC, NVIDIA"

Requesting subtitle output

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@recording.mp3 \
  -F model=whisper-1 \
  -F response_format=srt

Response

{
  "text": "Hello, this is a sample transcription of the audio file."
}

When using verbose_json, the response includes timestamps and segments:

{
  "task": "transcribe",
  "language": "en",
  "duration": 42.5,
  "text": "Hello, this is a sample transcription of the audio file.",
  "segments": [
    { "start": 0.0, "end": 2.4, "text": "Hello, this is a sample transcription." }
  ]
}

When using srt, the response is plain text subtitle output:

1
00:00:00,000 --> 00:00:02,400
Hello, this is a sample transcription.

When using vtt, the response is returned as WebVTT:

WEBVTT

00:00:00.000 --> 00:00:02.400
Hello, this is a sample transcription.

Supported Audio Formats

mp3, mp4, mpeg, mpga, m4a, wav, webm, opus

File uploads are limited to 25 MB. When using file_uri, there is no size limit.

Notes

  • Billing is based on audio duration, charged at the per-model rate listed above.
  • If no model is specified, rapid is used by default.
  • Streaming (stream=true) is not supported at this time.