
User Manual
Complete guide to every Dikt feature.
Getting Started
System Requirements
- Windows 10 or later (64-bit)
- .NET 8 Desktop Runtime (x64) — the installer prompts you to download this if needed
- Any audio input device (built-in or external microphone)
- ~200 MB disk space for the app, plus 75 MB–3 GB for Whisper models
Installation
Download the installer from the download page. Run the installer — it checks for .NET 8, installs Dikt, and launches the Setup Wizard on first run.
Setup Wizard
The 9-page Setup Wizard guides you through: Welcome, License activation (trial or key), Microphone selection and testing, Transcription mode (Local/Cloud/Managed), API key entry (if using cloud), Notification Overlay configuration (type, style, position, display, size), Compact Overlay configuration (type, style, position, display, size, waveform style), Storage preferences (auto-delete, retention), and a completion summary.
Quick Start
- 1Press Ctrl+Alt+Space — you'll hear a start sound and see "Recording..."
- 2Speak naturally — say what you want to type
- 3Release the hotkey — your text appears at the cursor
Recording & Dictation
Hotkey Modes
Push-to-Talk (default): Hold the hotkey while speaking, release when done.Toggle: Press once to start, press again to stop. Change in Settings > General > Toggle mode.
During Recording
You'll see a "Recording..." status, an elapsed timer, and a real-time audio level bar (green = normal, orange = medium, red = too loud). You can cancel anytime with the Cancel button.
Push-to-Talk Grace Period
When using push-to-talk mode, if you press and release the hotkey within the configured grace period, Dikt switches to toggle-like behavior — it keeps recording until you press the hotkey again. This is useful for longer dictations where holding the key is uncomfortable. Configure the grace period duration in Settings > General. An optional audio cue plays when the grace period activates.
Sound Effects
Dikt plays sounds on recording start/stop. Disable in Settings > General > Play sound effects, or set custom WAV files.
Transcription Modes
| Mode | How It Works | Pros | Cons |
|---|---|---|---|
| Local | Whisper.cpp on your PC | Free, offline, private | Slower, hardware dependent |
| Cloud | OpenAI Whisper API | Fast (1-3s), accurate | ~$0.006/min, needs API key |
| Cloud AI | GPT-4o Audio | Transcription + cleanup in one call | Higher cost, needs API key |
| Managed | Dikt's backend (Pro only) | No API key needed | Requires Pro subscription |
Enable Provider Failover in Settings > Advanced to automatically try another provider if your primary one fails.
Whisper Models
Download and manage models in Settings > Transcription > Model Manager. Models ending in .en are English-only (faster for English). Multilingual models support 99 languages.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny / tiny.en | ~75 MB | Fastest | Basic |
| base / base.en | ~150 MB | Fast | Good |
| small / small.en | ~500 MB | Medium | Better |
| medium / medium.en | ~1.5 GB | Slow | Great |
| large-v3-turbo | ~1.5 GB | Medium | Best |
| large-v3 | ~3 GB | Slowest | Best |
Text Output
- Auto-inject (default): Text automatically typed at your cursor position
- Copy (Ctrl+C): Copy to clipboard
- Insert (Ctrl+Enter): Inject at cursor manually
- Find & Replace (Ctrl+F / Ctrl+H): Search and replace in transcription
- Append mode: New transcriptions added to end of existing text
Voice Commands
Speak punctuation and formatting naturally. Dikt converts spoken words into characters.
| Say This | You Get |
|---|---|
| "period" or "full stop" | . |
| "comma" | , |
| "question mark" | ? |
| "exclamation mark" | ! |
| "new paragraph" | Two line breaks |
| "new line" | One line break |
| "dash" | — |
| "ellipsis" | ... |
| "open quote" / "close quote" | " / " |
| "open paren" / "close paren" | ( / ) |
| "semicolon" / "colon" | ; / : |
| "tab" | Tab character |
Voice commands also auto-capitalize after sentence-ending punctuation.
Editing Voice Commands
- "scratch that" — Deletes last transcribed characters
- "delete last word" — Deletes previous word
- "select all" — Selects all text
- "go to start" / "go to end" — Moves cursor
- "copy that" — Copies selected text
AI Features
AI Text Cleanup
Automatically fixes grammar, removes filler words, and improves punctuation after transcription. Choose a provider in Settings > AI & Text: Disabled, OpenAI (GPT-4o-mini), or Anthropic (Claude Haiku). Customize the system prompt to control cleanup behavior.
AI Command Mode
Transform selected text with voice instructions. Select text in any app, press Ctrl+Shift+Space, speak an instruction (e.g., "make this more formal", "translate to Spanish", "fix grammar"), and the AI replaces the selection with the result.
Custom Vocabulary
Add proper nouns, technical terms, and brand names in Settings > AI & Text to improve transcription accuracy. Words are passed to Whisper as context and preserved during AI cleanup.
Snippets
Text shortcuts that expand trigger phrases into longer text. Example: "sig" expands to "Best regards, John Smith". Supports {date} and {time} variables. Configure in Settings > AI & Text.
Context-Aware Formatting
Dikt detects the active application and adjusts AI cleanup tone automatically. Built-in defaults: Outlook/Thunderbird (Formal), Slack/Discord/Teams (Casual), Word/OneNote (Prose), Notepad/VS Code (Minimal). Add custom profiles in Settings > AI & Text.
Profanity Filter & Swear Jar
Automatically removes curse words from transcriptions using dual coverage: AI prompt augmentation and a local word-list filter. A built-in list of common words is included, and you can add custom words in Settings > Swear Jar. The Swear Jar tracks total words removed, a configurable cost-per-word ($0.25 default), daily trends, and top offenders. Optional notifications show how many words were cleaned after each dictation.
Custom AI Personas
AI personas change the style and tone of your AI cleanup. Choose from 6 built-in personas: Technical Writer (precise, clear), Doctor (clinical notes), Journalist (engaging prose), Lawyer (formal legal), Code Reviewer (preserves identifiers exactly), or Casual (light, conversational). You can also create custom personas with your own system prompt. Select a persona in Settings > Speed & AI > Cleanup Persona. Personas can be assigned per Voice Profile.
Multi-Turn AI Command Mode
AI Command Mode (Ctrl+Shift+Space) now remembers your last 3 instructions so you can refine text iteratively. For example: "make this more formal", then "shorten it", then "add bullet points". The AI retains context between turns within the same session. Say "start over" or "clear history" to reset the conversation. Context is automatically cleared when you start a new recording session.
Real-Time Translation
Dictate in any language and have the output automatically translated to a different language. Set your target language in Settings > Speed & AI > Output Language (e.g. "French", "Spanish", "Japanese"). The translation is appended to the AI cleanup step, so it works with both managed proxy (Pro) and BYOK API keys. Leave the setting blank to disable translation.
Word Correction Training
Define wrong→correct word pairs that are applied automatically on every future transcription. For example, if Whisper consistently transcribes a name incorrectly ("Jon" → "John"), add a correction rule. Rules use case-insensitive whole-word matching and are applied as a post-processing pass before AI cleanup. Manage corrections in Settings > Vocabulary > Word Corrections.
Markdown Voice Mode
Say Markdown structure out loud and it's expanded to proper syntax. Supported commands: "heading one/two/three" (→ #/##/###), "bullet point" (→ -), "numbered list" (→ 1.), "open code block" (→ ```), "bold" (→ **text**), "italic" (→ _text_), and "link" (→ [text](url)). Markdown voice mode is automatically enabled when Obsidian, Typora, or VS Code is the active window, or you can enable it manually in Settings > Speed & AI.
Advanced Features
Fast Mode
Enabled by default. Uses greedy decoding and flash attention for faster local transcription with a slight accuracy trade-off. Toggle in Settings > Transcription.
Whisper Mode
For quiet dictation in shared spaces. Amplifies quiet audio and lowers voice detection thresholds so Whisper picks up soft speech. Toggle in Settings > Transcription.
Streaming Transcription
Submits audio chunks every 3-15 seconds (default 7s) while recording for partial results. Reduces perceived latency for long dictations. Works with Local and Cloud (not GPT-4o). Enable in Settings > Transcription.
Voice Profiles
Multiple users can share the same machine with completely separate settings, vocabulary, text history, snippets, and statistics. Each profile is fully isolated — switching profiles restarts Dikt and loads that profile's data.
- Open: Settings > Voice Profiles tab, or right-click the tray icon > Profiles
- Create: Click "New Profile", enter a name — a fresh data directory is created for that profile
- Per-profile data: Each profile has its own vocabulary list, text history, snippets, AI prompt, swear-jar stats, and all settings
- Switch: Settings > Voice Profiles > select a profile > Switch to Selected (app restarts), or right-click tray icon > Profiles > select a profile
- The "Default" profile uses the existing data directory for backward compatibility with prior installations
Batch Transcription
Transcribe multiple audio files at once by dragging and dropping them into the main window. Supports MP3, WAV, M4A, FLAC, OGG, WMA, AAC, and WebM formats. Each file is transcribed in sequence with AI post-processing applied, and results are saved to your transcription history.
Multi-Language
99 languages supported with multilingual Whisper models. Select a specific language or use auto-detect in Settings > Transcription > Language.
GPU Acceleration
Speed up local Whisper transcription by offloading computation to your GPU. In Settings > Models > Acceleration, choose Auto (detects your GPU automatically), CPU (software only), CUDA (NVIDIA GPUs), or DirectML (AMD and Intel GPUs). GPU acceleration can reduce transcription time significantly, especially with larger models like medium and large-v3.
Noise Profile Learning
Filter out consistent background noise (fans, AC, office hum) for cleaner transcription. Click "Learn Background Noise" in Settings > Audio to record 3 seconds of ambient sound. Dikt saves a noise profile and applies spectral subtraction to future recordings, removing matched frequencies before audio reaches the transcription engine. Re-learn any time your environment changes.
Auto-Correction Suggestions
When enabled, Dikt uses word-level timestamps from Whisper's JSON output to calculate confidence scores for each word. Low-confidence words are highlighted with an underline in a preview overlay before injection, letting you review and fix uncertain words. Enable in Settings > Speed & AI > Auto-Correction.
Code Dictation Mode
Automatically formats dictation as code when you're in an IDE. Dikt detects active IDEs — Rider, Visual Studio, Cursor, IntelliJ, VS Code — and activates a "code" context profile with a specialized AI system prompt. Say "create a function called get user that takes id and returns a user object" and get formatted code output. The code profile is one of the built-in context profiles and can be customized in Settings > AI & Text > Context Profiles.
TTS Preview
Hear your transcription read aloud before it's injected — useful for proofreading long dictations. After transcription completes, Dikt reads the text back using your chosen TTS provider. Press your hotkey or the record key to confirm and inject, or press Escape to cancel. Two providers are available: Windows built-in voices (free, offline) and ElevenLabs neural voices (high quality, requires an ElevenLabs API key and voice ID). Configure in Settings > Speed & AI > TTS Preview.
Clipboard Re-Inject Queue
When text injection fails — no focused window, access denied, or no cursor detected — the transcription is queued instead of lost. The compact overlay shows a "transcription queued" indicator. Click the overlay to inject the queued text into whatever window you focus next. The queue is stored in memory and cleared on app restart or manual dismiss.
Wake Word Detection
Say "Hey Dikt" to start recording hands-free — no hotkey needed. A lightweight background detector uses Windows built-in speech recognition to monitor your microphone and triggers recording when it hears the wake phrase. No model download required — works out of the box on Windows 10 and later.
Integrations
Obsidian Integration
Send every dictation directly into your Obsidian vault. Text is appended to the configured note on each dictation — no Obsidian plugin required, Dikt writes directly to the vault's file system.
- Go to Settings > Advanced > Output Target and select Obsidian
- Vault Path: Enter the full path to your Obsidian vault folder (e.g.,
C:\Users\you\Documents\MyVault) - Note Path: Enter a relative path within the vault (e.g.,
Daily Notes/{{date}}.md) — supports{{date}}and{{time}}placeholders to create date-stamped notes automatically - Transcriptions are appended to the note; the file is created if it doesn't exist
- Use the Output Template field with {text}, {date}, {time} placeholders to control the exact format of each appended entry
Notion Integration
Append dictations directly to a Notion page using the Notion API. Each dictation is added as a new paragraph block at the bottom of the page.
- Step 1 — Create an integration: Go to notion.so/my-integrations, click "New integration", give it a name, and copy the Internal Integration Token (API key)
- Step 2 — Share your page: Open the target Notion page, click Share (top right), invite your integration by name, and confirm
- Step 3 — Get the page ID: Copy the 32-character hex string from the page URL (the segment after the page title and before any
?) - Step 4 — Configure Dikt: Go to Settings > Advanced > Output Target > Notion, enter your API Key and Page ID
- Transcriptions are appended as paragraph blocks; Dikt must be running with an internet connection for this to work
VS Code Extension
Dictate directly into the VS Code editor using Dikt's local HTTP API. The extension communicates with the Dikt desktop app running in the background — Dikt must be running for the extension to work.
- Step 1 — Enable Local API: In Dikt, go to Settings > Advanced > Local API and enable the local API server (runs on
127.0.0.1:9847) - Step 2 — Install extension: Search for "Dikt" in the VS Code Extensions marketplace and install it
- Dictate: Press
Ctrl+Alt+Spacein the editor to start dictating — speak, then release to insert - Insert modes: at cursor (default), replace selection, or append on new line — configure in VS Code extension settings
- The VS Code status bar shows the Dikt connection state (connected / recording / offline)
Output Templates
When using Obsidian or Notion as the output target, a template controls the exact text that is written for each dictation. Templates are configured in Settings > Advanced > Output Target > Template.
- Supported placeholders:
{text}(the transcribed text),{date}(today's date, YYYY-MM-DD),{time}(current time, HH:mm) - Example template:
## {date}\n{text}\n---produces a heading, the transcription, and a divider for each entry - Enable Include timestamp to automatically prepend
[YYYY-MM-DD HH:mm]to each entry regardless of the template
History & Analytics
Transcription History
Browse recent transcriptions in the sidebar. Search by keyword, pin important items, play back audio recordings. History limit configurable from 10-500 items (default 50).
Export
Export history as TXT (plain text), CSV (spreadsheet), JSON (programmatic), or SRT (subtitles with timing).
Dashboard
Access from Settings > Dashboard. View time saved, words transcribed, total dictations, success rate, daily word chart (30 days), and provider breakdown pie chart.
Dictation Streaks
Dikt tracks consecutive days on which you complete at least one dictation — like a language-learning app streak — to keep you motivated and build a daily habit.
- View streaks: Settings > Dashboard tab — shows current streak, longest streak, weekly word count, and milestones earned
- Milestone badges are awarded at 7, 14, 30, 90, and 365 consecutive days
- Missed day: Missing a single day resets the current streak to 0 — your longest streak is preserved separately
- Streak data persists in
streak.jsonin your profile's data directory
Account & Licensing
| Feature | Trial (14 days) | BYOK (Lifetime) | Pro | Team |
|---|---|---|---|---|
| Local transcription | Yes | Yes | Yes | Yes |
| Cloud transcription | Yes | Own key | Managed | Managed |
| AI cleanup & commands | Yes | Own key | Managed | Managed |
| AI Assistant | No | No | Yes | Yes |
| Cloud Sync | No | No | Yes | Yes |
| Team Workspace | No | No | No | Yes |
| Shared Vocabulary | No | No | No | Yes |
| Duration | 14 days | Lifetime | Subscription | Subscription |
Activating a License
Go to Settings > License. Enter your license key and click Activate, or switch to the Login tab and sign in with your dikt.app account credentials.
Web Dashboard
Access at www.dikt.app/dashboard. View license status, manage cloud-synced settings (vocabulary, snippets, profiles, AI prompt), handle billing (Stripe portal for Pro), and view usage statistics.
Cloud Sync
Syncs custom vocabulary, snippets, context profiles, and AI prompt between desktop and web dashboard. Enable in Settings > Advanced > Cloud Sync. Uses version-based conflict resolution — newer changes win.
Team Workspaces
The Team plan ($25/seat/month) gives organizations a shared workspace with team-wide vocabulary, snippets, voice profiles, and team management. All Pro features are included for every seat.
- Create a team: Purchase the Team plan, go to your account dashboard, click Create Team, and invite members by email
- Shared settings: Team admins manage shared vocabulary and snippets; these form the baseline that all members inherit via cloud sync automatically
- Personal overrides: Personal settings always take precedence over team defaults — members can customize without affecting the rest of the team
- Shared voice profiles: Team voice profiles (with shared vocabulary and snippets) can be shared across all members and appear in each member's profile list
- Invite and remove members, view per-member usage, and manage billing from the team admin section of the web dashboard
Settings Reference
All settings are in the Settings panel, organized by tab.
General
Recording hotkey, toggle mode, push-to-talk grace period, grace period sound, microphone selection, launch at startup, show in taskbar, sound effects, custom start/stop sounds, minimize to tray, auto-inject text, notification style (Toast/Popup/Overlay/None), notification duration, theme (System/Light/Dark), history limit, typing animation.
Overlay
Notification overlay: type (Minimal/Line/Box), style (Pill/Box/Text), position (9 presets + custom), opacity, display (monitor selector), size (1–100). Compact overlay: same settings configured independently, plus waveform style (None/Mirror/Bars/Pulse/Steps/Dots/Ribbon/Peaks), waveform size, show transcription, show controls, show history (Box type only), click opens app, lock position, auto-hide with configurable delay.
API Keys
OpenAI API key and Anthropic API key. Both encrypted at rest with Windows DPAPI and excluded from settings export.
Transcription
Provider (Local/Cloud/Cloud AI/Managed), language, streaming transcription, chunk interval, fast mode, whisper mode, model manager.
AI & Text
LLM provider, custom system prompt, custom vocabulary, snippets, context-aware formatting toggle, context profiles.
Text Processing
Output Target: Clipboard (default), Obsidian, or Notion. Clipboard copies to the Windows clipboard after each dictation. Obsidian appends to a vault note (configure vault path and note path, supports {{date}}/{{time}} in the note path). Notion appends a paragraph block to a shared page (API key + page ID required). Output Template: controls the format written for each dictation using {text}, {date}, {time} placeholders; optional timestamp prefix.
Voice Profiles
Create, rename, delete, and switch profiles. Each profile has isolated settings, vocabulary, history, snippets, and stats. The active profile name is shown at the top of the tab. Switching profiles restarts the app.
Advanced
Provider failover, retry transient errors, local-only mode, auto-delete recordings (by age or immediately after transcription), export/import settings, clear all data, telemetry (opt-out), update channel (Stable/Beta/Canary), cloud sync. Local API: enable/disable the local HTTP API server on port 9847 — required for the VS Code extension; Dikt must be running for the extension to connect.
Privacy & Data
What Stays Local
Audio recordings (unless cloud mode), transcription history, settings, API keys (DPAPI-encrypted).
Telemetry
Optional anonymous usage data: machine hash, app version, OS, transcription count, provider. No audio, text, or personal data. Opt out in Settings > Advanced.
Data Locations
%APPDATA%\Dikt\settings.json— Settings%APPDATA%\Dikt\recordings\— Audio recordings%APPDATA%\Dikt\history.json— Transcription history%APPDATA%\Dikt\logs\— Application logs
Use Local-Only Mode (Settings > Advanced) to prevent all network communication. Use Clear All Data to delete all history, recordings, and reset settings.
System Tray & Updates
System Tray
Left-click the tray icon to show the main window. Right-click for: Show Window, Compact Mode (small floating status window), Exit.
Updates
Dikt auto-checks for updates and shows a notification bar when available. Choose your channel in Settings > Advanced: Stable (default), Beta, or Canary. After updating, a "What's New" dialog shows release notes.
Getting Help
Use the in-app support form (Settings > Support) to submit tickets with optional log and settings attachments. Pro subscribers also have access to the AI Assistant for instant help. Visit www.dikt.app/support for additional resources.