Voice Code is a real-time dictation mode for coding. Speak naturally and text streams into your editor. Voice commands let you interact with Cursor and VS Code suggestions without touching the keyboard.

Available on: Mac, Windows, Linux (desktop only).

Getting Started

Toggle Voice Code

Platform	Shortcut
Mac	`Control + Option + .`
Windows	`Ctrl + Alt + .`
Linux	Enable in Settings > Audio Device > Voice Code

A floating overlay appears showing the current state.

Mute / Unmute

Pause Voice Code without disabling it:

Platform	Shortcut
Mac	`Control + Option + E`
Linux	`Ctrl + Alt + E`

How It Works

You toggle Voice Code on
A floating overlay appears (always-on-top, draggable)
Speak - audio streams to the server via WebSocket
A neural Voice Activity Detection (VAD) model detects when you're talking
The server transcribes and classifies your speech as either dictation or a command
Text auto-inserts into your focused editor window

All intelligence is server-side. The desktop app handles audio capture and text insertion.

Voice Commands

Say "command" followed by a number to select code suggestions:

"command one" through "command nine" - select numbered suggestions
"select" - accept current suggestion
"delete" - delete selection
"undo" - undo last action

Commands are recognized by the server-side LLM classifier and don't get inserted as text.

States

The floating overlay shows one of four states:

Inactive - Voice Code is off
Listening - waiting for you to speak
Speech Detected - capturing your voice
Processing - server is transcribing

Voice Code listening

Voice Code detecting speech

Voice Code processing

Smart Insert with Voice Code

Transcribed text auto-inserts into whatever window was focused when you started speaking. On Mac, this uses the accessibility API. On Windows, PowerShell clipboard + SendInput. On Linux, ydotool (Wayland) or xdotool (X11).

Performance

First speech after enabling Voice Code has a ~2 second delay on Mac (FFmpeg audio encoding warmup). Subsequent utterances are near-instant.
Audio streams in real-time as OGG/Opus via WebSocket
Server processes speech through Groq (with Gemini fallback) for fast transcription

Tips

Position your floating overlay where it doesn't block code
Mute Voice Code during meetings or conversations to avoid accidental input
Voice Code works best with clear, deliberate speech
Use "command" prefix for actions so the AI knows it's not dictation
Works alongside your keyboard - you can type and speak interchangeably