One-shot vs persistent cloning: pick the right approach

Prerequisites

Create a Fish Audio account

Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section

Get your API key

Once you have an account, you’ll need an API key to authenticate your requests.

Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely

Keep your API key secret! Never commit it to version control or share it publicly.

Recipe

There are two ways to clone a voice. Pick by how often you’ll reuse it:

One-shot (instant) — pass a ReferenceAudio (raw bytes + exact transcript) on each convert call. Nothing is stored server-side; the clone lives only for that request.
Persistent — call voices.create once to train a model, then reuse its id as reference_id on every request. No reference upload per call, and the same voice is shared across processes.

Start with one-shot. Below, a single reference clip is cloned inline with no model to manage:

from fishaudio import FishAudio
from fishaudio.types import ReferenceAudio
from fishaudio.utils import save

client = FishAudio()

with open("reference.wav", "rb") as f:
    audio = client.tts.convert(
        text="This line is spoken in the cloned voice, no model required.",
        references=[ReferenceAudio(
            audio=f.read(),
            text="Exact transcript of what is said in reference.wav.",
        )],
    )

save(audio, "oneshot.mp3")

One-shot re-sends the reference bytes on every request, so it’s ideal for one-off or rarely-repeated voices. Once a voice is used more than a handful of times, switch to a persistent model to skip the per-call upload.

Train a persistent voice once, reuse forever

Call voices.create to train a model, then pass voice.id as reference_id. The same id works from any process and across SDK and REST.

with open("reference.wav", "rb") as f:
    voice = client.voices.create(title="My Narrator", voices=[f.read()])

# reuse the same id on every later request — no reference upload
audio = client.tts.convert(
    text="Reusing my saved voice across many requests.",
    reference_id=voice.id,
)
save(audio, "persistent.mp3")

Already have a trained voice id? Skip training and pass it directly:

audio = client.tts.convert(text="Hello again.", reference_id="<voice-id>")

Which to choose

	One-shot	Persistent
Setup	None	One `voices.create` call
Per request	Re-uploads reference bytes	Sends only `reference_id`
Stored server-side	No	Yes (manage with `voices.update` / `voices.delete`)
Best for	One-off or experimental clones	Voices reused many times or across services

For either path, give the reference 10–30 s of clean speech and make the transcript match the audio exactly (including punctuation) for the best prosody.

​Prerequisites

​Recipe

​Train a persistent voice once, reuse forever

​Which to choose

​Related

Prerequisites

Recipe

Train a persistent voice once, reuse forever

Which to choose

Related