Realtime: LLM tokens → speech

Prerequisites

Create a Fish Audio account

Go to fish.audio/auth/signup
Fill in your details to create an account, complete steps to verify your account.
Log in to your account and navigate to the API section

Get your API key

Once you have an account, you’ll need an API key to authenticate your requests.

Log in to your Fish Audio Dashboard
Navigate to the API Keys section
Click “Create New Key” and give it a descriptive name, set a expiration if desired
Copy your key and store it securely

Keep your API key secret! Never commit it to version control or share it publicly.

Recipe

stream_websocket() takes an iterable of text chunks and yields audio chunks in real time. Feed it your LLM’s token stream and play or forward the audio as it’s produced.

from fishaudio import FishAudio
from fishaudio.utils import play

client = FishAudio()

def llm_tokens():
    # Replace with your real streaming LLM call
    for token in ["The ", "first ", "move ", "sets ", "everything ", "in ", "motion."]:
        yield token

audio_stream = client.tts.stream_websocket(llm_tokens(), reference_id="<voice-id>")
play(audio_stream)  # or: for chunk in audio_stream: send_to_client(chunk)

Force generation at a boundary

By default the engine buffers text until it has enough for natural prosody. Yield a FlushEvent to force synthesis of what’s buffered — useful for turn-taking in a conversation:

from fishaudio.types import TextEvent, FlushEvent

def turns():
    yield TextEvent(text="Are you ready?")
    yield FlushEvent()              # speak the question now
    yield TextEvent(text="Let's begin.")

The SDK sends the start/stop frames for you — you only supply text and optional flushes.

Errors mid-stream surface as WebSocketError. Reconnect with a fresh call rather than retrying on the same socket.

​Prerequisites

​Recipe

​Force generation at a boundary

​Related

Prerequisites

Recipe

Force generation at a boundary

Related