Text to Speech POST /v1/audio/speech
Generate speech audio from text. Supports preset voices and custom voice cloning.
POST https://api.leaper.one/v1/audio/speech
Model Pricing Description moss-tts-nano0.005 credits / 1K chars Multilingual TTS with voice cloning
Parameter Type Required Description model string No Model to use. Default: "moss-tts-nano" input string Yes Text to synthesize (max 4096 characters) voice string No Voice preset name. Default: "nova" (Chinese). See Available Voices response_format string No Output format: "wav" or "pcm". Default: "wav"
Parameter Type Required Description model string No Model to use. Default: "moss-tts-nano" input string Yes Text to synthesize (max 4096 characters) voice string No Voice preset name (used as base for cloning) prompt_audio file No Reference audio file for voice cloning (max 1MB). Supports mp3, wav, flac, m4a, ogg response_format string No Output format: "wav" or "pcm". Default: "wav"
curl -X POST https://api.leaper.one/v1/audio/speech \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "moss-tts-nano",
"input": "Hello, welcome to LEAPERone.",
"voice": "alloy"
}' \
--output speech.wav
curl -X POST https://api.leaper.one/v1/audio/speech \
-H "Authorization: Bearer sk-your-api-key" \
-F model=moss-tts-nano \
-F input="This text will be spoken in the cloned voice." \
-F voice=alloy \
-F prompt_audio=@reference.mp3 \
--output cloned.wav
from openai import OpenAI
client = OpenAI(
base_url = "https://api.leaper.one/v1" ,
api_key = "sk-your-api-key"
)
response = client.audio.speech.create(
model = "moss-tts-nano" ,
input = "Hello, welcome to LEAPERone." ,
voice = "alloy"
)
response.stream_to_file( "speech.wav" )
Voice cloning requires multipart/form-data to upload the reference audio file. The OpenAI SDK does not support this — use curl or a custom HTTP request for voice cloning.
Returns audio bytes directly. The Content-Type header indicates the format:
Format Content-Type wavaudio/wavpcmapplication/octet-stream
PCM format is raw 48kHz, 16-bit, stereo (2 channels).
Voice Language Style alloyEnglish Welcome / neutral echoEnglish News anchor fableEnglish Gentle reminder onyxEnglish Academic / lecture novaChinese Welcome / neutral shimmerChinese Soft / gentle
Voice Style zh-welcomeNeutral welcome zh-gentleSoft late-night zh-taiwanTaiwanese accent zh-beijingBeijing dialect zh-cultureFormal / cultural zh-yangmiCelebrity voice
Voice Style en-welcomeNeutral welcome en-lessonAcademic lecture en-newsNews anchor en-gentleGentle reminder en-taylorCelebrity voice en-quietCalm / reflective
Voice Language ja-newsJapanese ko-newsKorean es-newsSpanish fr-newsFrench de-newsGerman it-newsItalian ru-newsRussian
You can also pass demo-1 through demo-29 directly for the full list of MOSS presets.
Billing is based on character count at 0.005 credits per 1,000 characters.
Voice cloning is significantly slower than preset voices (CPU inference). Expect 20-60 seconds for short text.
The prompt_audio file should be a clear speech sample, ideally 5-15 seconds long.
Maximum input length is 4,096 characters.