Short answer: upgrade to Pro, generate an API key in your account settings, and POST an audio file to the AICHE endpoint. You get back the same cleaned-up text the desktop hotkey produces - filler removed, punctuation added, custom vocabulary applied.
The problem this solves
If you want transcription inside a cron job, a Zapier step, a CI hook, a Discord bot, or a server-side ingest pipeline, AICHE exposes the same pipeline behind a regular HTTP endpoint on the Pro tier. No sales call, no minimum seat count, no procurement loop - the same key that runs your scripts is the same subscription that runs your desktop and mobile apps.
How it works
- Upgrade to Pro at /pricing. The API is gated to the Pro tier and turns on the moment your subscription is active.
- Open your account settings, go to the API section, and generate a key. Keys are per-account, revocable, and scoped to your subscription.
- Drop the key into the
Authorizationheader of your request, the same way you would for any other modern API. - POST audio to the transcription endpoint. The API accepts the common formats (
.wav,.mp3,.m4a,.ogg,.webm) and standard multipart uploads. - The audio goes through the full AICHE pipeline: Whisper transcription, hallucination filter, filler and stutter removal, custom vocabulary enforcement, and Groq-hosted LLM polish.
- You get back JSON with the cleaned-up text. Roughly 3 seconds for 15 minutes of audio, the same speed the desktop app hits.
- Audio is purged from the processing path immediately after processing, within 1 second. The API call doesn't log audio, doesn't train on it, doesn't store it.
What you actually get back
The endpoint doesn't return raw Whisper output. It returns the same text the desktop hotkey would have inserted at your cursor: filler words ("um", "uh", "like", false starts) stripped, punctuation and paragraph breaks placed, your 50-entry custom vocabulary applied, and a Groq-hosted LLM polish pass on top. Whisper covers 99 languages on the input side, so a recording in German, Japanese, or Hindi comes back as transcribed text in that language - or, with auto-translation enabled in your account, as clean English.
The known Whisper failure modes (phantom "thanks for watching" insertions on quiet audio, repeated stutters, mid-sentence punctuation drift) are filtered before the response lands in your code. You don't have to write your own scrubber, because the scrubber is part of the product.
For developers
The use cases worth building against:
- Meeting and call ingest. Pipe Zoom / Meet / Slack Huddle recordings into the API and get back text you can drop into a doc, ticket, or CRM without further cleanup.
- Inbox triage. Forward voice-memo emails through a Lambda that calls the API and writes the result to your notes system.
- Field capture. Mobile data collection apps for clinicians, surveyors, inspectors. Record on-device, ship the file, get usable structured text back on the server.
- Bot input layers. Discord, Slack, Telegram bots that accept voice messages and respond in text.
- Personal automations. A cron job that walks a folder of
.m4afiles from your iPhone Voice Memos sync and writes Markdown next to each one. - CI / docs pipelines. Voice-recorded changelog notes processed during release builds.
The API runs against the same backend as the desktop and mobile apps, so any improvement to the polish pipeline lands in your integration without a client update on your side.
Self-serve, not enterprise-gated
You click upgrade, you generate a key, you ship code. No procurement form, no annual minimum, no "let's get on a call to talk about your volume". Pro is $9.99/mo monthly, $99.99/yr annual ($8.33/mo equivalent), with a 7-day free trial and no credit card required to start.
For the developers who treat voice as a first-class input on a daily basis, the math is straightforward: one Pro subscription covers up to 10 devices for the apps AND the API key for everything else. There's no separate "API plan" upsell.
Tips
Cache your key, not your audio. Treat the API key like any other credential (environment variable, secret manager). Don't keep recordings around after the request returns - the API has already discarded them on its side.
Match the desktop vocabulary. Custom vocabulary syncs across the account, so any 50-entry dictionary you've built on the desktop app applies to API calls too. Adding a new brand name once fixes it everywhere.
Combine with the Software Development profile. If your audio is code-heavy (architecture descriptions, library names, CLI flags), turn on the Software Development profile in your account. The API honors the same recognition tuning the desktop app uses.
Batch where you can. The API is fast (~3s for 15 minutes), but if you're processing hundreds of files, fire requests concurrently rather than serially. Pro tier includes priority processing, so your jobs jump the queue under load.
Send the cleanest audio you can. A 16kHz mono recording at a reasonable bitrate is plenty. Padding the file with silence or stacking effects doesn't help the model; clean source audio does.
Result: a 15-minute recording in any of 99 languages becomes filler-free, punctuated, vocabulary-corrected text in about 3 seconds, available to any script, server, or tool that can make an HTTP request.
Related
- API Documentation - full endpoint reference, code examples, streaming protocol
- Add Voice to Your Tools - practical guide to adding a mic button to internal tools, admin panels, and AI agents
Try it now: upgrade to Pro, generate an API key in your account settings, and curl a .wav from your downloads folder at the AICHE transcription endpoint. You'll see the same cleaned-up output the desktop hotkey produces, returned as JSON.