Core features
Text to Speech
Convert text into lifelike speech with the
s2-pro and s1 models.Speech to Text
Transcribe audio to text with per-segment timestamps.
Voice Cloning
Clone a voice instantly from a clip, or train a persistent model.
Realtime Streaming
Stream audio as it generates — for voice agents and live apps.
Manage Voices
List, inspect, update, and delete your voice models.
Also in the web app
These run in the browser, no code required — see the Platform guide.Voice Changer
Transform existing audio into a different voice.
Story Studio
Produce multi-speaker, long-form audio — audiobooks and narration.
Music & Sound Effects
Generate music and cinematic sound effects from a prompt.
Audio Separation
Split audio into stems, and related processing utilities.
Models
Two text-to-speech models power most capabilities:s2-pro— the default, highest-quality model, with multi-speaker and natural-language expression control.s1— the previous generation, with(parenthesis)emotion tags.
Pick your path
Use the web app
No code — generate audio, clone voices, and produce projects in your browser.
Build with the SDK
The Python library for your application.
Call the API
Raw REST and WebSocket endpoints for any language.
Use your AI coding agent
Install the Fish Audio skill so your agent writes correct code.

