Voice Code Guide

Hands-free dictation for coding. Voice commands for Cursor and VS Code. Desktop only (Mac, Windows, Linux).

4 min read
Last updated: 4/7/2026
voice-codecodingdeveloperfeatures

Voice Code is a real-time dictation mode for coding. Speak naturally and text streams into your editor. Voice commands let you interact with Cursor and VS Code suggestions without touching the keyboard.

Available on: Mac, Windows, Linux (desktop only).

Getting Started

Toggle Voice Code

Platform Shortcut
Mac Control + Option + .
Windows Ctrl + Alt + .
Linux Enable in Settings > Audio Device > Voice Code

A floating overlay appears showing the current state.

Mute / Unmute

Pause Voice Code without disabling it:

Platform Shortcut
Mac Control + Option + E
Linux Ctrl + Alt + E

How It Works

  1. You toggle Voice Code on
  2. A floating overlay appears (always-on-top, draggable)
  3. Speak - audio streams to the server via WebSocket
  4. A neural Voice Activity Detection (VAD) model detects when you're talking
  5. The server transcribes and classifies your speech as either dictation or a command
  6. Text auto-inserts into your focused editor window

All intelligence is server-side. The desktop app handles audio capture and text insertion.

Voice Commands

Say "command" followed by a number to select code suggestions:

  • "command one" through "command nine" - select numbered suggestions
  • "select" - accept current suggestion
  • "delete" - delete selection
  • "undo" - undo last action

Commands are recognized by the server-side LLM classifier and don't get inserted as text.

States

The floating overlay shows one of four states:

  1. Inactive - Voice Code is off
  2. Listening - waiting for you to speak
  3. Speech Detected - capturing your voice
  4. Processing - server is transcribing

Smart Insert with Voice Code

Transcribed text auto-inserts into whatever window was focused when you started speaking. On Mac, this uses the accessibility API. On Windows, PowerShell clipboard + SendInput. On Linux, ydotool (Wayland) or xdotool (X11).

Performance

  • First speech after enabling Voice Code has a ~2 second delay on Mac (FFmpeg audio encoding warmup). Subsequent utterances are near-instant.
  • Audio streams in real-time as OGG/Opus via WebSocket
  • Server processes speech through Groq (with Gemini fallback) for fast transcription

Tips

  • Position your floating overlay where it doesn't block code
  • Mute Voice Code during meetings or conversations to avoid accidental input
  • Voice Code works best with clear, deliberate speech
  • Use "command" prefix for actions so the AI knows it's not dictation
  • Works alongside your keyboard - you can type and speak interchangeably