Text to Speech

Generate speech audio from text. Supports preset voices and custom voice cloning.

Endpoint

POST https://api.leaper.one/v1/audio/speech

Models

Model	Pricing	Description
`moss-tts-nano`	0.005 credits / 1K chars	Multilingual TTS with voice cloning

Parameters

JSON Body (preset voices)

Parameter	Type	Required	Description
model	string	No	Model to use. Default: `"moss-tts-nano"`
input	string	Yes	Text to synthesize (max 4096 characters)
voice	string	No	Voice preset name. Default: `"nova"` (Chinese). See Available Voices
response_format	string	No	Output format: `"wav"` or `"pcm"`. Default: `"wav"`

Multipart Form (voice cloning)

Parameter	Type	Required	Description
model	string	No	Model to use. Default: `"moss-tts-nano"`
input	string	Yes	Text to synthesize (max 4096 characters)
voice	string	No	Voice preset name (used as base for cloning)
prompt_audio	file	No	Reference audio file for voice cloning (max 1MB). Supports mp3, wav, flac, m4a, ogg
response_format	string	No	Output format: `"wav"` or `"pcm"`. Default: `"wav"`

Request

Basic usage with preset voice

curl -X POST https://api.leaper.one/v1/audio/speech \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moss-tts-nano",
    "input": "Hello, welcome to LEAPERone.",
    "voice": "alloy"
  }' \
  --output speech.wav

Voice cloning with custom audio

curl -X POST https://api.leaper.one/v1/audio/speech \
  -H "Authorization: Bearer sk-your-api-key" \
  -F model=moss-tts-nano \
  -F input="This text will be spoken in the cloned voice." \
  -F voice=alloy \
  -F prompt_audio=@reference.mp3 \
  --output cloned.wav

Using the OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://api.leaper.one/v1",
    api_key="sk-your-api-key"
)

response = client.audio.speech.create(
    model="moss-tts-nano",
    input="Hello, welcome to LEAPERone.",
    voice="alloy"
)
response.stream_to_file("speech.wav")

Voice cloning requires multipart/form-data to upload the reference audio file. The OpenAI SDK does not support this — use curl or a custom HTTP request for voice cloning.

Response

Returns audio bytes directly. The Content-Type header indicates the format:

Format	Content-Type
`wav`	`audio/wav`
`pcm`	`application/octet-stream`

PCM format is raw 48kHz, 16-bit, stereo (2 channels).

Available Voices

OpenAI-compatible aliases

Voice	Language	Style
`alloy`	English	Welcome / neutral
`echo`	English	News anchor
`fable`	English	Gentle reminder
`onyx`	English	Academic / lecture
`nova`	Chinese	Welcome / neutral
`shimmer`	Chinese	Soft / gentle

Chinese voices

Voice	Style
`zh-welcome`	Neutral welcome
`zh-gentle`	Soft late-night
`zh-taiwan`	Taiwanese accent
`zh-beijing`	Beijing dialect
`zh-culture`	Formal / cultural
`zh-yangmi`	Celebrity voice

English voices

Voice	Style
`en-welcome`	Neutral welcome
`en-lesson`	Academic lecture
`en-news`	News anchor
`en-gentle`	Gentle reminder
`en-taylor`	Celebrity voice
`en-quiet`	Calm / reflective

Other languages

Voice	Language
`ja-news`	Japanese
`ko-news`	Korean
`es-news`	Spanish
`fr-news`	French
`de-news`	German
`it-news`	Italian
`ru-news`	Russian

You can also pass demo-1 through demo-29 directly for the full list of MOSS presets.

Notes

Billing is based on character count at 0.005 credits per 1,000 characters.
Voice cloning is significantly slower than preset voices (CPU inference). Expect 20-60 seconds for short text.
The prompt_audio file should be a clear speech sample, ideally 5-15 seconds long.
Maximum input length is 4,096 characters.

On this page