Audio Transcription

The Audio Transcription endpoint converts speech to text. You can upload an audio file or pass a URL.

Models

Model	Pricing	Best for
`rapid` (default)	0.006 credits/min	Fast, general-purpose transcription
`whisper-1`	0.006 credits/min	High accuracy with prompt support

Quick Start

POST /v1/audio/transcriptions

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@meeting.mp3

Response

{
  "text": "Welcome to today's meeting. Let's start with the agenda..."
}

Using file_uri

Instead of uploading a file, you can pass a URL. This is recommended for large files or when your audio is already hosted online.

Transcribe from URL

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file_uri=https://example.com/podcast-episode.mp3 \
  -F language=en \
  -F response_format=verbose_json

file_uri supports any publicly accessible audio URL. Formats include mp3, opus, m4a, wav, and more. No file size limit when using URL.

Choosing a Model

rapid (default)

Best for quick transcription without extra configuration. No model parameter needed. Supports language and prompt parameters.

Using rapid

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@meeting.mp3 \
  -F language=zh \
  -F response_format=json

whisper-1

OpenAI's Whisper model. Supports prompt to improve recognition of specific terms and verbose_json for word-level timestamps.

Using whisper-1 with prompt

curl -X POST https://api.leaper.one/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F file=@meeting.mp3 \
  -F model=whisper-1 \
  -F response_format=verbose_json \
  -F prompt="LEAPERone, API, transcription"

Supported Formats

Format	Extension
MP3	`.mp3`
MP4	`.mp4`
MPEG	`.mpeg`, `.mpga`
M4A	`.m4a`
WAV	`.wav`
WebM	`.webm`
Opus	`.opus`

Response Formats

Set response_format to control the output:

Value	Description
`json`	JSON object with a `text` field (default).
`text`	Plain text transcription.
`verbose_json`	JSON with timestamps, segments, and metadata.
`srt`	SubRip subtitle format.
`vtt`	WebVTT subtitle format.

Billing is based on audio duration. See the API Reference for per-model pricing.

On this page