Voice Code is a real-time dictation mode for coding. Speak naturally and text streams into your editor. Voice commands let you interact with Cursor and VS Code suggestions without touching the keyboard.
Available on: Mac, Windows, Linux (desktop only).
Getting Started
Toggle Voice Code
| Platform | Shortcut |
|---|---|
| Mac | Control + Option + . |
| Windows | Ctrl + Alt + . |
| Linux | Enable in Settings > Audio Device > Voice Code |
A floating overlay appears showing the current state.
Mute / Unmute
Pause Voice Code without disabling it:
| Platform | Shortcut |
|---|---|
| Mac | Control + Option + E |
| Linux | Ctrl + Alt + E |
How It Works
- You toggle Voice Code on
- A floating overlay appears (always-on-top, draggable)
- Speak - audio streams to the server via WebSocket
- A neural Voice Activity Detection (VAD) model detects when you're talking
- The server transcribes and classifies your speech as either dictation or a command
- Text auto-inserts into your focused editor window
All intelligence is server-side. The desktop app handles audio capture and text insertion.
Voice Commands
Say "command" followed by a number to select code suggestions:
- "command one" through "command nine" - select numbered suggestions
- "select" - accept current suggestion
- "delete" - delete selection
- "undo" - undo last action
Commands are recognized by the server-side LLM classifier and don't get inserted as text.
States
The floating overlay shows one of four states:
- Inactive - Voice Code is off
- Listening - waiting for you to speak
- Speech Detected - capturing your voice
- Processing - server is transcribing
Smart Insert with Voice Code
Transcribed text auto-inserts into whatever window was focused when you started speaking. On Mac, this uses the accessibility API. On Windows, PowerShell clipboard + SendInput. On Linux, ydotool (Wayland) or xdotool (X11).
Performance
- First speech after enabling Voice Code has a ~2 second delay on Mac (FFmpeg audio encoding warmup). Subsequent utterances are near-instant.
- Audio streams in real-time as OGG/Opus via WebSocket
- Server processes speech through Groq (with Gemini fallback) for fast transcription
Tips
- Position your floating overlay where it doesn't block code
- Mute Voice Code during meetings or conversations to avoid accidental input
- Voice Code works best with clear, deliberate speech
- Use "command" prefix for actions so the AI knows it's not dictation
- Works alongside your keyboard - you can type and speak interchangeably