WebSocket TTS Streaming

The WebSocket TTS endpoint enables bidirectional streaming for low-latency text-to-speech generation with MessagePack serialization.

The request payload inside StartEvent uses the same parameters as the HTTP Text to Speech API. For more detailed field guidance, model-specific behavior, and examples, see that page. In WebSocket mode, request.text is typically empty in StartEvent, and the text content is sent through subsequent TextEvent messages.

Speech to Text Overview

⌘I