OpenClaw Voice Ops

liveclassic / gemini-3.1-flash-lite-preview

Current stack, switchable runtime modes, and provider candidates.

Audit Call page Live host

2026-04-28 live VPS check

Live stack

live

mode: classic
brain: gemini-3.1-flash-lite-preview
released: 2026-03-03
voice: Leda / ru-RU
audio: separate-stt-llm-tts
stt/tts: google-cloud-speech / chirp_3 / google-cloud-gemini / gemini-3.1-flash-tts-preview
tools: enabled
interrupt: vad

services

bridgeactive

webactive

gatewayactive

Read-only cockpit. Real switches still go through `openclaw-voice-stack` or allowlisted voice tools.

codex / chronicle

Continuity digest

loaded: no
age: unknown
items: ...
window: none
sources: none
generated: none

source labels

not loaded

Voice reads only this compact digest. Raw Chronicle screenshots/OCR stay out of calls.

next tests

Experiment queue

Gemini Live tools smoke

Keep as regression smoke before provider changes.

passed

Grounded OpenClaw lookup in voice

Ask VPS/current-state questions and inspect tool traces.

retry

Long tool-call pause masking

Track native async continuation; do not fall back to Gemini 2.5.

blocked

xAI realtime sandbox

Run a short xAI-focused call after native-realtime guard fixes; compare pauses against Gemini/Gemini Cloud baseline.

prepared

Hume Octave 2 sample pack

Generate RU/EN/ES assistant phrases and rank by ear.

research

MiniMax Speech 2.8 sample pack

Check latency, streaming shape, licensing, and phone codecs.

research

Eleven v3 / Expressive Mode boundary

Verify whether Flash v3 is exposed as simple TTS for agents.

research

matrix

Provider readiness

providersurfacestatusmodelreleasedwhynext

Gemini 3.1 Flash Live

gemini-live

Realtime brain

passedswitchable on VPS

gemini-3.1-flash-live-preview

2026-03-26

Highest-priority realtime candidate; available through the new AI Studio key.

Run real audio call smoke and measure LiveKit 3.1 compatibility limits.

Gemini 3.1 Flash Lite

gemini-lite

Classic LLM

readyready in classic mode

gemini-3.1-flash-lite-preview

2026-03-03

Fast practical baseline when we need external STT/TTS control.

Use for voice-quality A/B when TTS is the variable.

Gemini 3.1 Flash TTS

gemini-tts-31

TTS

passedCloud Text-to-Speech streaming room smoke passed on VPS

gemini-3.1-flash-tts-preview

2026-04-16 docs / Cloud streaming 2025-11-07

Controllable Gemini TTS path: 30 voices, prompt/director-note style tuning, Russian supported, streaming PCM chunks via Cloud TTS.

Keep Leda/fast in favorites and as the saved field-test default; retest Leda vs Kore after the classic per-call override fix.

xAI Grok Voice Think Fast

xai-voice

Realtime stack

preparedofficial LiveKit plugin wired on VPS

grok-voice-think-fast-1.0

2026-04-23

Native realtime voice model with tools; the interesting xAI brain candidate, not just REST TTS.

Run a shorter human xAI call focused on pauses/tool continuation before any baseline promotion.

OpenAI GPT Realtime 2

openai-realtime-2

Realtime stack

preparedwired locally as openai-live; needs API-key smoke on VPS

gpt-realtime-2

2026-01 official changelog / current model page

OpenAI finally has a materially newer realtime voice model: reasoning voice agent, 128k realtime context, stronger tools, preambles, and native speed/voice controls.

Run room smoke with marin/cedar and compare tool continuation, TTFT, VAD and Russian conversational feel against Gemini Live.

xAI Grok TTS

xai-tts

TTS

readyclassic provider available on VPS; not current default

xai-tts / REST /v1/tts

2026-03-16 GA

Six voices surfaced in API; RU MP3 and expressive tags work. Bridge now uses 24 kHz PCM plus LiveKit resampling instead of unsafe direct 8 kHz phone codecs.

Keep as voice-color candidate; compare against Gemini Cloud TTS and Eleven before using broadly.

xAI Grok STT

xai-stt

STT

retrybatch smoke passed with caveats

/v1/stt

2026-04-15 GA

MP3/WAV batch STT is fast and usable; observed transient 500s and bad raw μ-law Russian behavior.

Benchmark real phone corpus against Deepgram Nova 3 and Google Chirp 3; test streaming separately.

Deepgram Nova 3

deepgram

STT

readyavailable on VPS

nova-3

2025-02-12

Strong multilingual fallback, useful when Russian mixes with product terms.

Re-run benchmark on phone-call corpus.

Google Chirp 3 STT

google-stt

STT

readyavailable on VPS

chirp_3

2025-10-13 GA

Dedicated RU path via Speech-to-Text V2, good second opinion.

Keep as classic baseline and compare noisy PSTN calls.

ElevenLabs Flash

eleven

TTS

readyavailable on VPS

eleven_flash_v2_5

2024-12-18

Production-safe voice quality baseline; VIA already knows this territory.

Investigate v3/Expressive Mode boundary separately.

Cartesia Sonic 3

cartesia

TTS

readyavailable on VPS

sonic-3-2026-01-12

2026-01-12

Fast classic-mode TTS with curated ES/EN voices.

Use as latency reference, not as the only naturalness candidate.

Hume Octave 2

hume

Voice candidate

researchnot wired

Octave 2

2025-10-01

Most interesting emotional/naturalness candidate from the research pass.

Build sample pack before LiveKit integration work.

MiniMax Speech 2.8

minimax

Voice candidate

researchnot wired

speech-2.8-turbo / hd

2026-02-04

Multilingual expressive TTS with tags; plausible VIA voice contender.

Generate RU/ES samples and compare phone-path artifacts.

switch map

Runtime modes

modestatusaudiobrainspeechtoolsuse

Gemini Live baseline

google-live

passed

native realtime

google-realtime / gemini-3.1-flash-live-preview via AI Studio

Leda, ru-RU, STT/TTS = none

native tool_response

Быстрые живые разговоры, проверка tools, phone/web smoke.

Classic composable stack

classic

live

separate STT / LLM / TTS

google / gemini-3.1-flash-lite-preview

Google Chirp 3 STT + Gemini Cloud TTS Leda/ru-RU, 1.2x

AgentSession generateReply

A/B голоса, STT benchmark, fallback если realtime ведёт себя странно.

xAI Grok Voice Think Fast

xai-realtime

prepared

native realtime

xai-realtime / grok-voice-think-fast-1.0

ara/eve/rex/sal/leo/una, STT/TTS = none

official LiveKit xAI plugin

Проверить, может ли xAI стать конкурентом Gemini Live для VIA.

OpenAI GPT Realtime 2

openai-realtime-2

prepared

native realtime

openai-realtime / gpt-realtime-2

marin/cedar/etc, STT/TTS = none, speed via native audio output

official LiveKit OpenAI plugin

Проверить, догнал ли OpenAI Gemini Live по tool use, long context и естественности речи.