LEAPERone Docs

Text to Speech

POST /v1/audio/speech

Generate speech audio from text. Supports preset voices and custom voice cloning.

Endpoint

POST https://api.leaper.one/v1/audio/speech

Models

ModelPricingDescription
moss-tts-nano0.005 credits / 1K charsMultilingual TTS with voice cloning

Parameters

JSON Body (preset voices)

ParameterTypeRequiredDescription
modelstringNoModel to use. Default: "moss-tts-nano"
inputstringYesText to synthesize (max 4096 characters)
voicestringNoVoice preset name. Default: "nova" (Chinese). See Available Voices
response_formatstringNoOutput format: "wav" or "pcm". Default: "wav"

Multipart Form (voice cloning)

ParameterTypeRequiredDescription
modelstringNoModel to use. Default: "moss-tts-nano"
inputstringYesText to synthesize (max 4096 characters)
voicestringNoVoice preset name (used as base for cloning)
prompt_audiofileNoReference audio file for voice cloning (max 1MB). Supports mp3, wav, flac, m4a, ogg
response_formatstringNoOutput format: "wav" or "pcm". Default: "wav"

Request

Basic usage with preset voice

curl -X POST https://api.leaper.one/v1/audio/speech \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moss-tts-nano",
    "input": "Hello, welcome to LEAPERone.",
    "voice": "alloy"
  }' \
  --output speech.wav

Voice cloning with custom audio

curl -X POST https://api.leaper.one/v1/audio/speech \
  -H "Authorization: Bearer sk-your-api-key" \
  -F model=moss-tts-nano \
  -F input="This text will be spoken in the cloned voice." \
  -F voice=alloy \
  -F prompt_audio=@reference.mp3 \
  --output cloned.wav

Using the OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.leaper.one/v1",
    api_key="sk-your-api-key"
)

response = client.audio.speech.create(
    model="moss-tts-nano",
    input="Hello, welcome to LEAPERone.",
    voice="alloy"
)
response.stream_to_file("speech.wav")

Voice cloning requires multipart/form-data to upload the reference audio file. The OpenAI SDK does not support this — use curl or a custom HTTP request for voice cloning.

Response

Returns audio bytes directly. The Content-Type header indicates the format:

FormatContent-Type
wavaudio/wav
pcmapplication/octet-stream

PCM format is raw 48kHz, 16-bit, stereo (2 channels).

Available Voices

OpenAI-compatible aliases

VoiceLanguageStyle
alloyEnglishWelcome / neutral
echoEnglishNews anchor
fableEnglishGentle reminder
onyxEnglishAcademic / lecture
novaChineseWelcome / neutral
shimmerChineseSoft / gentle

Chinese voices

VoiceStyle
zh-welcomeNeutral welcome
zh-gentleSoft late-night
zh-taiwanTaiwanese accent
zh-beijingBeijing dialect
zh-cultureFormal / cultural
zh-yangmiCelebrity voice

English voices

VoiceStyle
en-welcomeNeutral welcome
en-lessonAcademic lecture
en-newsNews anchor
en-gentleGentle reminder
en-taylorCelebrity voice
en-quietCalm / reflective

Other languages

VoiceLanguage
ja-newsJapanese
ko-newsKorean
es-newsSpanish
fr-newsFrench
de-newsGerman
it-newsItalian
ru-newsRussian

You can also pass demo-1 through demo-29 directly for the full list of MOSS presets.

Notes

  • Billing is based on character count at 0.005 credits per 1,000 characters.
  • Voice cloning is significantly slower than preset voices (CPU inference). Expect 20-60 seconds for short text.
  • The prompt_audio file should be a clear speech sample, ideally 5-15 seconds long.
  • Maximum input length is 4,096 characters.