Voice for the Agent Loop, Not Generic Web Chat

Pause-aware voice into Claude Code, Cursor, Codex, Antigravity

Speak prompts to Claude Code, Cursor, Codex, and Antigravity. Pause for a beat. AICHE sends.

Download AICHE
Works on
macOSWindowsLinux

Short answer: press ⌃+⌥+R (Mac) or Ctrl+Alt+R (Windows/Linux) inside Claude Code, Cursor, Codex, or Antigravity, speak the prompt, then stop talking. After a brief pause, AICHE ships the cleaned text to the agent on its own. No Enter, no Send button, no hand off the desk to confirm.

The problem this solves

Driving an agent loop is dozens of small prompts per hour. Each one is the same micro-ritual: speak or type, reach for Enter, watch the agent run, react, repeat. The Enter key sits between you and the loop, and after a few hours it's the part that breaks flow. Worse, generic browser dictation tools weren't built for this rhythm. They wait for an explicit stop, they live in one tab, and they don't know what "send to the agent" means.

How it works

  1. Open your agent: Claude Code in a terminal, Codex CLI, Cursor's agent panel, or Antigravity.
  2. Click into the agent's input field (terminal prompt, chat box, whichever).
  3. Press ⌃+⌥+R on Mac or Ctrl+Alt+R on Windows / Linux. Recording is toggle, not push-to-talk.
  4. Speak the full prompt. Pacing is fine. Long context is fine. "Refactor the auth module to use JWT, add refresh-token rotation with 7-day expiry, add Redis for token blacklisting, and rate-limit the refresh endpoint at 10 req/min per user."
  5. Stop speaking. After a brief pause, AICHE transcribes, cleans the text, inserts it at the cursor, and submits the prompt for you.
  6. When the agent comes back asking to run something (apply a patch, execute a shell command, write a file), answer by voice - say "approve" or speak the numbered option ("command one", "command two"). The voice confirmation routes the agent's prompt without you reaching for the keyboard.
  7. Continue the loop hands-free. Pace the room. Think out loud. Ship prompts.

What "pause-aware auto-send" actually means

The send is the differentiator, not the dictation. Most voice tools stop at "insert text". You then have to switch back, find Enter, and submit. In a 50-prompt session that's 50 context switches.

AICHE's Voice Code mode watches for the natural stop in your speech. When you finish a sentence and pause for a beat, it treats that as the submit signal. The text gets cleaned, inserted, and the prompt ships to the agent. You can immediately start the next thought or wait for the agent to respond. The hotkey is still there if you want to start a new recording, but you don't need it to submit.

A few practical notes:

  • The pause threshold is calibrated for natural dictation, not nervous half-second gaps mid-sentence. Trailing off at the end of a thought is the trigger.
  • It only auto-sends inside the supported agents (Claude Code, Codex, Cursor, Antigravity). Dictating into a Notion doc or a Gmail compose window inserts text without sending, which is what you want there.
  • If you start speaking again before the pause completes, the recording continues and the previous fragment stays uncommitted until the next real stop.

Voice confirmations for agent actions

Agents now spend a lot of their loop asking permission. "Run this command?" "Apply this patch?" "Write this file?" The intended ergonomic is a quick yes / no from the human. With your hands off the keyboard - pacing, on a call, holding a coffee - those approval gates become the new bottleneck.

Voice confirmations close that gap. When the agent surfaces an action, you can answer by voice instead of reaching for the keyboard. The answer ships, the agent continues, and you stay in the same posture you were in when you dictated the prompt. The session feels like a conversation with the agent doing the work, instead of a typing session interrupted by the agent.

This is the part of Voice Code that makes the "walking while shipping" workflow actually viable. Without it, every prompt is voice, every confirmation is keyboard. With it, the whole loop is voice.

Not generic web dictation

The reason this is a separate Pro feature instead of "just dictation into a browser":

  • The pause-aware submit is wired specifically to the agent surfaces above. It does not fire blindly into every text field you focus, which would be hostile in a normal editor.
  • Recognition is tuned for code-adjacent speech (the Software Development profile, Pro). API names, kebab-case flags, snake_case identifiers, library names land correctly instead of getting auto-corrected into prose.
  • Your custom vocabulary (50 entries, synced across all your AICHE installs) enforces internal jargon - repo names, services, internal tools - so they're spelled the same way every prompt.
  • Auto-translation lets you think in your native language. Speak German, Mandarin, Spanish, Japanese, whatever your brain runs in, ship clean English prompts. AICHE supports 99 input languages.

A generic browser dictation extension treats every text field the same. Voice Code treats the agent loop as a first-class workflow.

Platform setup

Voice Code is Pro tier. The agent surfaces span macOS, Windows, and Linux desktops, which is where Claude Code, Codex CLI, Cursor, and Antigravity actually run.

  • macOS: ⌃+⌥+R. You must grant Accessibility permission once in System Settings → Privacy & Security → Accessibility for the global hotkey to work outside AICHE's own window.
  • Windows: Ctrl+Alt+R. No extra permissions. AICHE lives in the system tray and listens for the hotkey even with the main window closed.
  • Linux: Ctrl+Alt+R. X11 works out of the box. Wayland (GNOME 41+) may need AICHE launched with the right permissions for global hotkey registration. The Linux build ships in .deb, .rpm, AppImage, and Flatpak.

In all three cases, enable the Software Development profile in AICHE settings if you haven't. That's the recognition tuning that keeps code identifiers intact.

Tips

Build muscle memory on a low-stakes prompt first. Dictate a one-line "list the files changed in the last commit" type prompt three or four times so the pause timing feels normal. Once you trust the auto-send, you'll stop hovering over Enter.

Speak the prompt you'd write, not a shorter version. Speaking runs around 150 WPM versus 40 WPM on a keyboard. Longer prompts with constraints and edge cases produce fewer agent round-trips. Voice removes the typing cost that makes most developers shorten prompts.

Combine with custom vocabulary. Drop your repo names, service names, and internal jargon into the 50-entry custom vocabulary. They get spelled correctly every prompt, every device.

Use voice confirmations for the loop, keyboard for the edge cases. Approve or pick a numbered option by voice on routine prompts. When the agent asks something that needs a typed argument, fall back to the keyboard for that one step, then continue by voice.

Pace, don't sit. The reason voice for agents matters is not raw WPM. It's that you think differently when you're standing and walking. You catch architecture-level issues that don't surface when you're hunched over the keys.

Result: a 50-prompt agent session that used to be 50 typing sprints plus 50 Enter presses plus 50 confirmation clicks becomes a continuous conversation. The keyboard stops being the bottleneck of the loop.

Works With

  • Claude Code - voice prompts for the Claude Code CLI
  • Cursor - voice commands for Cursor's chat and agent panel
  • ChatGPT - voice input for ChatGPT prompts
  • VS Code - voice for documentation and comments
  • Terminal - voice in tmux, zellij, and terminal sessions

Try it now: open Claude Code, press ⌃+⌥+R (Mac) or Ctrl+Alt+R (Windows / Linux), speak one full architectural prompt you've been putting off because typing it felt like too much keyboard time, then stop and let it ship.

Tags

claudecursorai-codingworkflow