Use it in the web app
No code — upload audio, get a transcript.
API reference
Every parameter for
POST /v1/asr.Cookbooks
Captions, batch transcription, and more.
When to use it
Captions & subtitles
Timed segments map straight to SRT/VTT cues.
Meeting & call notes
Transcribe recordings for summaries and search.
Voice commands & notes
Turn short utterances into text your app can act on.
Accessibility
Make audio and video content readable.
Quick start
Read an audio file, send the bytes, get the transcript. Choose your implementation:text, the audio duration in seconds, and timed segments.
Read the timestamps
Each segment carriesstart and end times in seconds — ideal for captions. With the API, ask for them explicitly with ignore_timestamps=false.
In the Python SDK, segment timestamps are on by default — pass
include_timestamps=False to skip them. That’s the inverse of the API/JavaScript flag ignore_timestamps.Implementation details
Language
language is optional — Fish Audio auto-detects it when you omit it. Pass an ISO code (en, zh, ja, …) to pin it and improve accuracy on short or noisy clips.
Input audio
Common formats work directly —wav, mp3, opus, and more. Send the raw file bytes; no pre-processing required. The endpoint accepts multipart/form-data (shown above) or application/msgpack.
File limits
One request transcribes one audio file. The endpoint accepts files up to 20 MB and 60 minutes long, with a minimum of 1 second of audio. For longer recordings, split them into chunks and transcribe each, then stitch the segment timestamps back together (offset each chunk’sstart/end by where it began in the full recording).
Async transcription
The Python SDK ships an async client with the same surface — useful when you’re transcribing many files concurrently or already running inside an event loop. UseAsyncFishAudio and await the call:
Direct API (MessagePack)
POST /v1/asr also accepts a MessagePack body instead of multipart form data — the same path the API reference links to for low-overhead, server-side calls. Pack the audio bytes and options into one payload and set Content-Type: application/msgpack:
text, duration (seconds), and segments.
Going further
Generate speech
The reverse direction — text to lifelike audio.
Full API parameters
Every field and the raw response schema.
Python reference
asr.transcribe options and the ASRResponse type.
