OpenClaw Voice Ops

liveclassic / gemini-3.1-flash-lite-preview

Current stack, switchable runtime modes, and provider candidates.

matrix

Provider readiness

providersurfacestatusmodelreleasedwhynext

Gemini 3.1 Flash Live

gemini-live

Realtime brain
passedswitchable on VPS
gemini-3.1-flash-live-preview

Highest-priority realtime candidate; available through the new AI Studio key.

Run real audio call smoke and measure LiveKit 3.1 compatibility limits.

Gemini 3.1 Flash Lite

gemini-lite

Classic LLM
readyready in classic mode
gemini-3.1-flash-lite-preview

Fast practical baseline when we need external STT/TTS control.

Use for voice-quality A/B when TTS is the variable.

Gemini 3.1 Flash TTS

gemini-tts-31

TTS
passedCloud Text-to-Speech streaming room smoke passed on VPS
gemini-3.1-flash-tts-preview

Controllable Gemini TTS path: 30 voices, prompt/director-note style tuning, Russian supported, streaming PCM chunks via Cloud TTS.

Keep Leda/fast in favorites and as the saved field-test default; retest Leda vs Kore after the classic per-call override fix.

xAI Grok Voice Think Fast

xai-voice

Realtime stack
preparedofficial LiveKit plugin wired on VPS
grok-voice-think-fast-1.0

Native realtime voice model with tools; the interesting xAI brain candidate, not just REST TTS.

Run a shorter human xAI call focused on pauses/tool continuation before any baseline promotion.

OpenAI GPT Realtime 2

openai-realtime-2

Realtime stack
preparedwired locally as openai-live; needs API-key smoke on VPS
gpt-realtime-2

OpenAI finally has a materially newer realtime voice model: reasoning voice agent, 128k realtime context, stronger tools, preambles, and native speed/voice controls.

Run room smoke with marin/cedar and compare tool continuation, TTFT, VAD and Russian conversational feel against Gemini Live.

xAI Grok TTS

xai-tts

TTS
readyclassic provider available on VPS; not current default
xai-tts / REST /v1/tts

Six voices surfaced in API; RU MP3 and expressive tags work. Bridge now uses 24 kHz PCM plus LiveKit resampling instead of unsafe direct 8 kHz phone codecs.

Keep as voice-color candidate; compare against Gemini Cloud TTS and Eleven before using broadly.

xAI Grok STT

xai-stt

STT
retrybatch smoke passed with caveats
/v1/stt

MP3/WAV batch STT is fast and usable; observed transient 500s and bad raw μ-law Russian behavior.

Benchmark real phone corpus against Deepgram Nova 3 and Google Chirp 3; test streaming separately.

Deepgram Nova 3

deepgram

STT
readyavailable on VPS
nova-3

Strong multilingual fallback, useful when Russian mixes with product terms.

Re-run benchmark on phone-call corpus.

Google Chirp 3 STT

google-stt

STT
readyavailable on VPS
chirp_3

Dedicated RU path via Speech-to-Text V2, good second opinion.

Keep as classic baseline and compare noisy PSTN calls.

ElevenLabs Flash

eleven

TTS
readyavailable on VPS
eleven_flash_v2_5

Production-safe voice quality baseline; VIA already knows this territory.

Investigate v3/Expressive Mode boundary separately.

Cartesia Sonic 3

cartesia

TTS
readyavailable on VPS
sonic-3-2026-01-12

Fast classic-mode TTS with curated ES/EN voices.

Use as latency reference, not as the only naturalness candidate.

Hume Octave 2

hume

Voice candidate
researchnot wired
Octave 2

Most interesting emotional/naturalness candidate from the research pass.

Build sample pack before LiveKit integration work.

MiniMax Speech 2.8

minimax

Voice candidate
researchnot wired
speech-2.8-turbo / hd

Multilingual expressive TTS with tags; plausible VIA voice contender.

Generate RU/ES samples and compare phone-path artifacts.

switch map

Runtime modes

modestatusaudiobrainspeechtoolsuse

Gemini Live baseline

google-live

passed
native realtime

google-realtime / gemini-3.1-flash-live-preview via AI Studio

Leda, ru-RU, STT/TTS = none

native tool_response

Быстрые живые разговоры, проверка tools, phone/web smoke.

Classic composable stack

classic

live
separate STT / LLM / TTS

google / gemini-3.1-flash-lite-preview

Google Chirp 3 STT + Gemini Cloud TTS Leda/ru-RU, 1.2x

AgentSession generateReply

A/B голоса, STT benchmark, fallback если realtime ведёт себя странно.

xAI Grok Voice Think Fast

xai-realtime

prepared
native realtime

xai-realtime / grok-voice-think-fast-1.0

ara/eve/rex/sal/leo/una, STT/TTS = none

official LiveKit xAI plugin

Проверить, может ли xAI стать конкурентом Gemini Live для VIA.

OpenAI GPT Realtime 2

openai-realtime-2

prepared
native realtime

openai-realtime / gpt-realtime-2

marin/cedar/etc, STT/TTS = none, speed via native audio output

official LiveKit OpenAI plugin

Проверить, догнал ли OpenAI Gemini Live по tool use, long context и естественности речи.

Color scheme toggle