Dictation
Dictation is the core feature and is free forever — no account, no license, no network calls.
How it works
Section titled “How it works”Voice Mode is push-to-talk. The user holds the dictation hotkey, speaks, and releases. A HUD appears while recording so the user can see they’re being heard. On release, Voice Mode transcribes, applies any corrections, and pastes the result into the currently focused app.
The stages
Section titled “The stages”- Audio capture from the selected input device.
- Voice activity detection trims leading and trailing silence so the transcriber isn’t fed dead air.
- Speech-to-text runs an on-device CoreML model. The model is selectable — a faster English-only model and a higher-quality multilingual model are both available.
- Dictionary — deterministic word and phrase substitutions the user
has configured (see
replacements.md). - AI Correction (optional) — an on-device small LLM rewrites the transcript to fix punctuation, capitalization, and misheard words. Free for everyone; the user can turn it off in the Dashboard’s AI Writing pane if they prefer raw transcription.
- Paste — the final text is placed on the clipboard and pasted at the cursor via Cmd+V.
Correction vs. dictionary
Section titled “Correction vs. dictionary”These solve different problems and compose:
- Dictionary entries are exact, predictable find-and-replace. Best for names, technical jargon, and anything the STT consistently mishears the same way.
- AI Correction is a small on-device LLM pass. Best for tone, punctuation, and context-dependent fixes.
Both are free.
Both run on-device.
Choosing a speech-to-text model
Section titled “Choosing a speech-to-text model”Voice Mode lets the user pick which on-device model handles transcription:
- A faster English-only model — lower latency, ideal for everyday English dictation.
- A higher-quality multilingual model — better at non-English speech, technical jargon, and accents, at the cost of some speed.
The choice is made in Settings → Models. Either model is free to use.
Highlight-and-correct
Section titled “Highlight-and-correct”In addition to the dictation pipeline, users can fix existing text by selecting it in any app and triggering the correction hotkey. Voice Mode reads the selected text, runs it through the same on-device LLM correction pass used in dictation, and pastes the cleaned-up version back in place. Useful for tightening up a paragraph the user just typed or rewriting a sentence that came out wrong.
Focus and paste behavior
Section titled “Focus and paste behavior”The paste targets whichever app is focused when the hotkey is released. If the user switches apps mid-sentence, the paste follows the new focus. For best results, focus the target app before pressing the hotkey and keep focus there until release.
Pro tips
Section titled “Pro tips”- Click into the target app first, then hold the hotkey. Dictating while a different window is focused is the most common reason “the text went somewhere weird.”
- Brief pauses help punctuation. With AI Correction on, a tiny pause at sentence boundaries is enough for the corrector to find the right place to put periods. The user doesn’t have to say “period” out loud.
- Highlight-and-correct works on text the user already typed. Select the rough sentence in any editable field, hit the correction hotkey, and it gets rewritten in place. Useful right before hitting Send.
- For jargon-heavy speech, switch models. The multilingual model handles technical terms and accents better than the English-only one; the trade-off is slightly higher latency. The choice is in Settings → Models.
- The Dictionary catches what models can’t. If a specific name keeps coming through wrong even after switching models, a dictionary entry is faster than fighting the STT.