Prerequisites
Create a Fish Audio account
Create a Fish Audio account
Sign up for a free Fish Audio account to get started with our API.
- Go to fish.audio/auth/signup
- Fill in your details to create an account, complete steps to verify your account.
- Log in to your account and navigate to the API section
Get your API key
Get your API key
Once you have an account, you’ll need an API key to authenticate your requests.
- Log in to your Fish Audio Dashboard
- Navigate to the API Keys section
- Click “Create New Key” and give it a descriptive name, set a expiration if desired
- Copy your key and store it securely
Recipe
A voice agent is three stages chained together:asr.transcribe() turns the caller’s audio into text, your own LLM turns that text into a reply, and tts.stream() turns the reply back into speech. The transcript and the reply are just strings, so the only Fish Audio-specific parts are the first and last calls. Streaming the reply lets you start writing (or forwarding) audio before the whole sentence is synthesized.
heard is an ASRResponse: heard.text is the full transcript and heard.duration is the clip length in seconds. Pass language="en" to transcribe() to skip auto-detection when you already know the input language.
Reply in the caller’s voice
reference_id points the reply at a saved voice. Drop it to use the default voice, or clone the caller’s voice from the same clip you just transcribed by passing references instead — see Instant voice cloning.

