Short answer: open Dify, click into a system prompt, agent-instruction field, or workflow-node config, press ⌃+⌥+R (Mac) or Ctrl+Alt+R (Windows/Linux), speak as long as you need, press again. AICHE inserts cleaned-up text in 2-3 seconds.
Dify configurations are long by design. A useful customer-support agent has tone rules, capability lists, escalation triggers, response format, constraints, and tool-use guardrails. A useful workflow node has step-by-step logic, conditional branching, error handling, and downstream effects. Typing all that takes 15-20 minutes per surface, which is why most agents ship with three sentences of instruction and disappoint in production.
Voice closes that gap. The same comprehensive instruction is 3-5 minutes of speaking instead of 15-20 of typing.
How It Works
- Open Dify (cloud or self-hosted).
- Open the agent, app, or workflow you're configuring.
- Click into the field: system prompt, agent instructions, node config, or prompt template.
- Press ⌃+⌥+R (Mac) or Ctrl+Alt+R (Windows/Linux).
- Speak the full configuration. No length cap.
- Press the hotkey again. AICHE transcribes, applies AI cleanup, inserts.
- Test with sample inputs.
Where Voice Pays Off in Dify
Agent System Prompts
A real support agent needs role, tone, capabilities, escalation triggers, response format, and "what to never do" constraints. Speaking it through, you naturally hit each section: "You're a customer-support agent for a SaaS PM platform. Tone: professional, friendly, solution-oriented. Capabilities: search the KB, explain features, troubleshoot sync and billing. Escalate immediately on legal, compliance, breach, or 'lawyer'. If a user reports a bug, collect repro steps, browser, screenshots, then create a bug ticket. Never make promises about features not in the docs. Detect language and reply in kind."
Five short sentences spoken, comprehensive instruction in the field.
Workflow Node Logic
Multi-step pipelines need explicit branching, error handling, and timeouts. Dictate each step: input validation, enrichment call with retry policy, scoring rule, routing decision, logging. Speaking the logic out loud often surfaces a missing edge case (the if Clearbit returns null branch you'd skip when typing).
Prompt Templates with Parameters
Reusable templates with {{customer_name}}, {{plan_type}}, {{trial_end_date}}. Dictate the natural-language template: "Hi {customer name}, welcome to {product}. We're excited to help {company name} with {primary use case}. Your {plan type} plan includes {feature list}. Trial ends {trial end date}." Add the curly braces afterward where AICHE wrote the words; faster than typing the whole thing.
Tool-Use and Function-Calling Specs
When configuring tools an agent can call, dictate each tool's purpose, when to use it, what parameters it takes, and what to do with the result. Speaking forces the level of detail that makes tool-use reliable.
Code Review or Eval Agents
Detailed evaluation criteria (security, error handling, performance, style) come out faster spoken than typed because you naturally cover what you actually look for in reviews. The result is a more thorough rubric than you'd patiently type.
What You Get
- Unlimited voice notes with AI cleanup - filler words removed, punctuation and paragraph breaks added.
- System-wide dictation - same hotkey works in Dify, your IDE, ChatGPT, Slack, anywhere.
- Custom vocabulary - drop in your product names, internal tools, model names, brand jargon.
- Software Development profile (Pro) - recognition tuned for code, APIs, library names if your prompts reference them.
- Multilingual voice input - speak in any supported language; auto-translate to English if your Dify config is English.
- Zero-retention audio - audio purged immediately after processing, within 1 second.
Plans start at $3.99/mo (annual) with a 7-day free trial, no credit card. See pricing.
Common Questions
Q: Does this work with self-hosted Dify?
A: Yes. AICHE is independent of where Dify runs. Whatever URL your Dify is at, the hotkey inserts text into the same web fields.
Q: Can I dictate JSON, YAML, or function-calling schemas directly?
A: You can, but for structured config a hybrid works best: dictate the natural-language description, then add JSON syntax. Or paste a schema template and dictate only the descriptions.
Q: Will AICHE handle template syntax ({{var_name}})?
A: Speak the natural prose with placeholder names ("customer name", "plan type"); add the {{ }} afterward. Faster than dictating curly braces.
Q: Does it work in Dify's prompt-engineering panel for fine-tuning?
A: Yes. Anywhere there's a text field, the hotkey inserts.
Q: My Dify configs reference internal model names and tools. Will spelling stay correct?
A: Add them to AICHE's Custom Vocabulary. Once added, they're spelled correctly in every dictation.
Result: comprehensive Dify configs in 4-5 minutes instead of 18-20. Speaking the logic catches missing branches and edge cases before deployment. Agent quality goes up because the instructions are actually thorough.
Try it now: open Dify, create a new agent, click into the system prompt, press your hotkey, and speak the agent's role, tone, capabilities, escalation triggers, and constraints in one pass. Watch the configuration get longer and more useful in two minutes flat.