Skip to main content

Prerequisites

Sign up for a free Fish Audio account to get started with our API.
  1. Go to fish.audio/auth/signup
  2. Fill in your details to create an account, complete steps to verify your account.
  3. Log in to your account and navigate to the API section
Once you have an account, you’ll need an API key to authenticate your requests.
  1. Log in to your Fish Audio Dashboard
  2. Navigate to the API Keys section
  3. Click “Create New Key” and give it a descriptive name, set a expiration if desired
  4. Copy your key and store it securely
Keep your API key secret! Never commit it to version control or share it publicly.

Recipe

There are two ways to clone a voice. Pick by how often you’ll reuse it:
  • One-shot (instant) — pass a ReferenceAudio (raw bytes + exact transcript) on each convert call. Nothing is stored server-side; the clone lives only for that request.
  • Persistent — call voices.create once to train a model, then reuse its id as reference_id on every request. No reference upload per call, and the same voice is shared across processes.
Start with one-shot. Below, a single reference clip is cloned inline with no model to manage:
from fishaudio import FishAudio
from fishaudio.types import ReferenceAudio
from fishaudio.utils import save

client = FishAudio()

with open("reference.wav", "rb") as f:
    audio = client.tts.convert(
        text="This line is spoken in the cloned voice, no model required.",
        references=[ReferenceAudio(
            audio=f.read(),
            text="Exact transcript of what is said in reference.wav.",
        )],
    )

save(audio, "oneshot.mp3")
One-shot re-sends the reference bytes on every request, so it’s ideal for one-off or rarely-repeated voices. Once a voice is used more than a handful of times, switch to a persistent model to skip the per-call upload.

Train a persistent voice once, reuse forever

Call voices.create to train a model, then pass voice.id as reference_id. The same id works from any process and across SDK and REST.
with open("reference.wav", "rb") as f:
    voice = client.voices.create(title="My Narrator", voices=[f.read()])

# reuse the same id on every later request — no reference upload
audio = client.tts.convert(
    text="Reusing my saved voice across many requests.",
    reference_id=voice.id,
)
save(audio, "persistent.mp3")
Already have a trained voice id? Skip training and pass it directly:
audio = client.tts.convert(text="Hello again.", reference_id="<voice-id>")

Which to choose

One-shotPersistent
SetupNoneOne voices.create call
Per requestRe-uploads reference bytesSends only reference_id
Stored server-sideNoYes (manage with voices.update / voices.delete)
Best forOne-off or experimental clonesVoices reused many times or across services
For either path, give the reference 10–30 s of clean speech and make the transcript match the audio exactly (including punctuation) for the best prosody.