Skip to main content
Fish Audio is a voice AI platform. Every core feature is available three ways: in the web app (no code), through the REST API, and via the official SDK.

Core features

Text to Speech

Convert text into lifelike speech with the s2-pro and s1 models.

Speech to Text

Transcribe audio to text with per-segment timestamps.

Voice Cloning

Clone a voice instantly from a clip, or train a persistent model.

Realtime Streaming

Stream audio as it generates — for voice agents and live apps.

Manage Voices

List, inspect, update, and delete your voice models.

Also in the web app

These run in the browser, no code required — see the Platform guide.

Voice Changer

Transform existing audio into a different voice.

Story Studio

Produce multi-speaker, long-form audio — audiobooks and narration.

Music & Sound Effects

Generate music and cinematic sound effects from a prompt.

Audio Separation

Split audio into stems, and related processing utilities.

Models

Two text-to-speech models power most capabilities:
  • s2-pro — the default, highest-quality model, with multi-speaker and natural-language expression control.
  • s1 — the previous generation, with (parenthesis) emotion tags.
See Models Overview and Choosing a Model for the full lineup, languages, and limits.

Pick your path

Use the web app

No code — generate audio, clone voices, and produce projects in your browser.

Build with the SDK

The Python library for your application.

Call the API

Raw REST and WebSocket endpoints for any language.

Use your AI coding agent

Install the Fish Audio skill so your agent writes correct code.