User Manual

Complete guide to every Dikt feature.

Getting Started

System Requirements

Windows 10 or later (64-bit)
.NET 8 Desktop Runtime (x64) — the installer prompts you to download this if needed
Any audio input device (built-in or external microphone)
~200 MB disk space for the app, plus 75 MB–3 GB for Whisper models

Installation

Download the installer from the download page. Run the installer — it checks for .NET 8, installs Dikt, and launches the Setup Wizard on first run.

Setup Wizard

The 9-page Setup Wizard guides you through: Welcome, License activation (trial or key), Microphone selection and testing, Transcription mode (Local/Cloud/Managed), API key entry (if using cloud), Notification Overlay configuration (type, style, position, display, size), Compact Overlay configuration (type, style, position, display, size, waveform style), Storage preferences (auto-delete, retention), and a completion summary.

Quick Start

1Press Ctrl+Alt+Space — you'll hear a start sound and see "Recording..."
2Speak naturally — say what you want to type
3Release the hotkey — your text appears at the cursor

Recording & Dictation

Hotkey Modes

Push-to-Talk (default): Hold the hotkey while speaking, release when done.Toggle: Press once to start, press again to stop. Change in Settings > General > Toggle mode.

During Recording

You'll see a "Recording..." status, an elapsed timer, and a real-time audio level bar (green = normal, orange = medium, red = too loud). You can cancel anytime with the Cancel button.

Push-to-Talk Grace Period

When using push-to-talk mode, if you press and release the hotkey within the configured grace period, Dikt switches to toggle-like behavior — it keeps recording until you press the hotkey again. This is useful for longer dictations where holding the key is uncomfortable. Configure the grace period duration in Settings > General. An optional audio cue plays when the grace period activates.

Sound Effects

Dikt plays sounds on recording start/stop. Disable in Settings > General > Play sound effects, or set custom WAV files.

Transcription Modes

Mode	How It Works	Pros	Cons
Local	Whisper.cpp on your PC	Free, offline, private	Slower, hardware dependent
Cloud	OpenAI Whisper API	Fast (1-3s), accurate	~$0.006/min, needs API key
Cloud AI	GPT-4o Audio	Transcription + cleanup in one call	Higher cost, needs API key
Managed	Dikt's backend (Pro only)	No API key needed	Requires Pro subscription

Enable Provider Failover in Settings > Advanced to automatically try another provider if your primary one fails.

Whisper Models

Download and manage models in Settings > Transcription > Model Manager. Models ending in .en are English-only (faster for English). Multilingual models support 99 languages.

Model	Size	Speed	Accuracy
tiny / tiny.en	~75 MB	Fastest	Basic
base / base.en	~150 MB	Fast	Good
small / small.en	~500 MB	Medium	Better
medium / medium.en	~1.5 GB	Slow	Great
large-v3-turbo	~1.5 GB	Medium	Best
large-v3	~3 GB	Slowest	Best

Text Output

Auto-inject (default): Text automatically typed at your cursor position
Copy (Ctrl+C): Copy to clipboard
Insert (Ctrl+Enter): Inject at cursor manually
Find & Replace (Ctrl+F / Ctrl+H): Search and replace in transcription
Append mode: New transcriptions added to end of existing text

Voice Commands

Speak punctuation and formatting naturally. Dikt converts spoken words into characters.

Say This	You Get
"period" or "full stop"	.
"comma"	,
"question mark"	?
"exclamation mark"	!
"new paragraph"	Two line breaks
"new line"	One line break
"dash"	—
"ellipsis"	...
"open quote" / "close quote"	" / "
"open paren" / "close paren"	( / )
"semicolon" / "colon"	; / :
"tab"	Tab character

Voice commands also auto-capitalize after sentence-ending punctuation.

Editing Voice Commands

"scratch that" — Deletes last transcribed characters
"delete last word" — Deletes previous word
"select all" — Selects all text
"go to start" / "go to end" — Moves cursor
"copy that" — Copies selected text

AI Features

AI Text Cleanup

Automatically fixes grammar, removes filler words, and improves punctuation after transcription. Choose a provider in Settings > AI & Text: Disabled, OpenAI (GPT-4o-mini), or Anthropic (Claude Haiku). Customize the system prompt to control cleanup behavior.

AI Command Mode

Transform selected text with voice instructions. Select text in any app, press Ctrl+Shift+Space, speak an instruction (e.g., "make this more formal", "translate to Spanish", "fix grammar"), and the AI replaces the selection with the result.

Custom Vocabulary

Add proper nouns, technical terms, and brand names in Settings > AI & Text to improve transcription accuracy. Words are passed to Whisper as context and preserved during AI cleanup.

Snippets

Text shortcuts that expand trigger phrases into longer text. Example: "sig" expands to "Best regards, John Smith". Supports {date} and {time} variables. Configure in Settings > AI & Text.

Context-Aware Formatting

Dikt detects the active application and adjusts AI cleanup tone automatically. Built-in defaults: Outlook/Thunderbird (Formal), Slack/Discord/Teams (Casual), Word/OneNote (Prose), Notepad/VS Code (Minimal). Add custom profiles in Settings > AI & Text.

Profanity Filter & Swear Jar

Automatically removes curse words from transcriptions using dual coverage: AI prompt augmentation and a local word-list filter. A built-in list of common words is included, and you can add custom words in Settings > Swear Jar. The Swear Jar tracks total words removed, a configurable cost-per-word ($0.25 default), daily trends, and top offenders. Optional notifications show how many words were cleaned after each dictation.

Custom AI Personas

AI personas change the style and tone of your AI cleanup. Choose from 6 built-in personas: Technical Writer (precise, clear), Doctor (clinical notes), Journalist (engaging prose), Lawyer (formal legal), Code Reviewer (preserves identifiers exactly), or Casual (light, conversational). You can also create custom personas with your own system prompt. Select a persona in Settings > Speed & AI > Cleanup Persona. Personas can be assigned per Voice Profile.

Multi-Turn AI Command Mode

AI Command Mode (Ctrl+Shift+Space) now remembers your last 3 instructions so you can refine text iteratively. For example: "make this more formal", then "shorten it", then "add bullet points". The AI retains context between turns within the same session. Say "start over" or "clear history" to reset the conversation. Context is automatically cleared when you start a new recording session.

Real-Time Translation

Dictate in any language and have the output automatically translated to a different language. Set your target language in Settings > Speed & AI > Output Language (e.g. "French", "Spanish", "Japanese"). The translation is appended to the AI cleanup step, so it works with both managed proxy (Pro) and BYOK API keys. Leave the setting blank to disable translation.

Word Correction Training

Define wrong→correct word pairs that are applied automatically on every future transcription. For example, if Whisper consistently transcribes a name incorrectly ("Jon" → "John"), add a correction rule. Rules use case-insensitive whole-word matching and are applied as a post-processing pass before AI cleanup. Manage corrections in Settings > Vocabulary > Word Corrections.

Markdown Voice Mode

Say Markdown structure out loud and it's expanded to proper syntax. Supported commands: "heading one/two/three" (→ #/##/###), "bullet point" (→ -), "numbered list" (→ 1.), "open code block" (→ ```), "bold" (→ **text**), "italic" (→ _text_), and "link" (→ [text](url)). Markdown voice mode is automatically enabled when Obsidian, Typora, or VS Code is the active window, or you can enable it manually in Settings > Speed & AI.

Advanced Features

Fast Mode

Enabled by default. Uses greedy decoding and flash attention for faster local transcription with a slight accuracy trade-off. Toggle in Settings > Transcription.

Whisper Mode

For quiet dictation in shared spaces. Amplifies quiet audio and lowers voice detection thresholds so Whisper picks up soft speech. Toggle in Settings > Transcription.

Streaming Transcription

Submits audio chunks every 3-15 seconds (default 7s) while recording for partial results. Reduces perceived latency for long dictations. Works with Local and Cloud (not GPT-4o). Enable in Settings > Transcription.

Voice Profiles

Multiple users can share the same machine with completely separate settings, vocabulary, text history, snippets, and statistics. Each profile is fully isolated — switching profiles restarts Dikt and loads that profile's data.

Open: Settings > Voice Profiles tab, or right-click the tray icon > Profiles
Create: Click "New Profile", enter a name — a fresh data directory is created for that profile
Per-profile data: Each profile has its own vocabulary list, text history, snippets, AI prompt, swear-jar stats, and all settings
Switch: Settings > Voice Profiles > select a profile > Switch to Selected (app restarts), or right-click tray icon > Profiles > select a profile
The "Default" profile uses the existing data directory for backward compatibility with prior installations

Batch Transcription

Transcribe multiple audio files at once by dragging and dropping them into the main window. Supports MP3, WAV, M4A, FLAC, OGG, WMA, AAC, and WebM formats. Each file is transcribed in sequence with AI post-processing applied, and results are saved to your transcription history.

Multi-Language

99 languages supported with multilingual Whisper models. Select a specific language or use auto-detect in Settings > Transcription > Language.

GPU Acceleration

Speed up local Whisper transcription by offloading computation to your GPU. In Settings > Models > Acceleration, choose Auto (detects your GPU automatically), CPU (software only), CUDA (NVIDIA GPUs), or DirectML (AMD and Intel GPUs). GPU acceleration can reduce transcription time significantly, especially with larger models like medium and large-v3.

Noise Profile Learning

Filter out consistent background noise (fans, AC, office hum) for cleaner transcription. Click "Learn Background Noise" in Settings > Audio to record 3 seconds of ambient sound. Dikt saves a noise profile and applies spectral subtraction to future recordings, removing matched frequencies before audio reaches the transcription engine. Re-learn any time your environment changes.

Auto-Correction Suggestions

When enabled, Dikt uses word-level timestamps from Whisper's JSON output to calculate confidence scores for each word. Low-confidence words are highlighted with an underline in a preview overlay before injection, letting you review and fix uncertain words. Enable in Settings > Speed & AI > Auto-Correction.

Code Dictation Mode

Automatically formats dictation as code when you're in an IDE. Dikt detects active IDEs — Rider, Visual Studio, Cursor, IntelliJ, VS Code — and activates a "code" context profile with a specialized AI system prompt. Say "create a function called get user that takes id and returns a user object" and get formatted code output. The code profile is one of the built-in context profiles and can be customized in Settings > AI & Text > Context Profiles.

TTS Preview

Hear your transcription read aloud before it's injected — useful for proofreading long dictations. After transcription completes, Dikt reads the text back using your chosen TTS provider. Press your hotkey or the record key to confirm and inject, or press Escape to cancel. Two providers are available: Windows built-in voices (free, offline) and ElevenLabs neural voices (high quality, requires an ElevenLabs API key and voice ID). Configure in Settings > Speed & AI > TTS Preview.

Clipboard Re-Inject Queue

When text injection fails — no focused window, access denied, or no cursor detected — the transcription is queued instead of lost. The compact overlay shows a "transcription queued" indicator. Click the overlay to inject the queued text into whatever window you focus next. The queue is stored in memory and cleared on app restart or manual dismiss.

Wake Word Detection

Say "Hey Dikt" to start recording hands-free — no hotkey needed. A lightweight background detector uses Windows built-in speech recognition to monitor your microphone and triggers recording when it hears the wake phrase. No model download required — works out of the box on Windows 10 and later.

Integrations

Obsidian Integration

Send every dictation directly into your Obsidian vault. Text is appended to the configured note on each dictation — no Obsidian plugin required, Dikt writes directly to the vault's file system.

Go to Settings > Advanced > Output Target and select Obsidian
Vault Path: Enter the full path to your Obsidian vault folder (e.g., C:\Users\you\Documents\MyVault)
Note Path: Enter a relative path within the vault (e.g., Daily Notes/{{date}}.md) — supports {{date}} and {{time}} placeholders to create date-stamped notes automatically
Transcriptions are appended to the note; the file is created if it doesn't exist
Use the Output Template field with {text}, {date}, {time} placeholders to control the exact format of each appended entry

Notion Integration

Append dictations directly to a Notion page using the Notion API. Each dictation is added as a new paragraph block at the bottom of the page.

Step 1 — Create an integration: Go to notion.so/my-integrations, click "New integration", give it a name, and copy the Internal Integration Token (API key)
Step 2 — Share your page: Open the target Notion page, click Share (top right), invite your integration by name, and confirm
Step 3 — Get the page ID: Copy the 32-character hex string from the page URL (the segment after the page title and before any ?)
Step 4 — Configure Dikt: Go to Settings > Advanced > Output Target > Notion, enter your API Key and Page ID
Transcriptions are appended as paragraph blocks; Dikt must be running with an internet connection for this to work

VS Code Extension

Dictate directly into the VS Code editor using Dikt's local HTTP API. The extension communicates with the Dikt desktop app running in the background — Dikt must be running for the extension to work.

Step 1 — Enable Local API: In Dikt, go to Settings > Advanced > Local API and enable the local API server (runs on 127.0.0.1:9847)
Step 2 — Install extension: Search for "Dikt" in the VS Code Extensions marketplace and install it
Dictate: Press Ctrl+Alt+Space in the editor to start dictating — speak, then release to insert
Insert modes: at cursor (default), replace selection, or append on new line — configure in VS Code extension settings
The VS Code status bar shows the Dikt connection state (connected / recording / offline)

Output Templates

When using Obsidian or Notion as the output target, a template controls the exact text that is written for each dictation. Templates are configured in Settings > Advanced > Output Target > Template.

Supported placeholders: {text} (the transcribed text), {date} (today's date, YYYY-MM-DD), {time} (current time, HH:mm)
Example template: ## {date}\n{text}\n--- produces a heading, the transcription, and a divider for each entry
Enable Include timestamp to automatically prepend [YYYY-MM-DD HH:mm] to each entry regardless of the template

History & Analytics

Transcription History

Browse recent transcriptions in the sidebar. Search by keyword, pin important items, play back audio recordings. History limit configurable from 10-500 items (default 50).

Export

Export history as TXT (plain text), CSV (spreadsheet), JSON (programmatic), or SRT (subtitles with timing).

Dashboard

Access from Settings > Dashboard. View time saved, words transcribed, total dictations, success rate, daily word chart (30 days), and provider breakdown pie chart.

Dictation Streaks

Dikt tracks consecutive days on which you complete at least one dictation — like a language-learning app streak — to keep you motivated and build a daily habit.

View streaks: Settings > Dashboard tab — shows current streak, longest streak, weekly word count, and milestones earned
Milestone badges are awarded at 7, 14, 30, 90, and 365 consecutive days
Missed day: Missing a single day resets the current streak to 0 — your longest streak is preserved separately
Streak data persists in streak.json in your profile's data directory

Account & Licensing

Feature	Trial (14 days)	BYOK (Lifetime)	Pro	Team
Local transcription	Yes	Yes	Yes	Yes
Cloud transcription	Yes	Own key	Managed	Managed
AI cleanup & commands	Yes	Own key	Managed	Managed
AI Assistant	No	No	Yes	Yes
Cloud Sync	No	No	Yes	Yes
Team Workspace	No	No	No	Yes
Shared Vocabulary	No	No	No	Yes
Duration	14 days	Lifetime	Subscription	Subscription

Activating a License

Go to Settings > License. Enter your license key and click Activate, or switch to the Login tab and sign in with your dikt.app account credentials.

Web Dashboard

Access at www.dikt.app/dashboard. View license status, manage cloud-synced settings (vocabulary, snippets, profiles, AI prompt), handle billing (Stripe portal for Pro), and view usage statistics.

Cloud Sync

Syncs custom vocabulary, snippets, context profiles, and AI prompt between desktop and web dashboard. Enable in Settings > Advanced > Cloud Sync. Uses version-based conflict resolution — newer changes win.

Team Workspaces

The Team plan ($25/seat/month) gives organizations a shared workspace with team-wide vocabulary, snippets, voice profiles, and team management. All Pro features are included for every seat.

Create a team: Purchase the Team plan, go to your account dashboard, click Create Team, and invite members by email
Shared settings: Team admins manage shared vocabulary and snippets; these form the baseline that all members inherit via cloud sync automatically
Personal overrides: Personal settings always take precedence over team defaults — members can customize without affecting the rest of the team
Shared voice profiles: Team voice profiles (with shared vocabulary and snippets) can be shared across all members and appear in each member's profile list
Invite and remove members, view per-member usage, and manage billing from the team admin section of the web dashboard

Settings Reference

All settings are in the Settings panel, organized by tab.

General

Recording hotkey, toggle mode, push-to-talk grace period, grace period sound, microphone selection, launch at startup, show in taskbar, sound effects, custom start/stop sounds, minimize to tray, auto-inject text, notification style (Toast/Popup/Overlay/None), notification duration, theme (System/Light/Dark), history limit, typing animation.

Overlay

Notification overlay: type (Minimal/Line/Box), style (Pill/Box/Text), position (9 presets + custom), opacity, display (monitor selector), size (1–100). Compact overlay: same settings configured independently, plus waveform style (None/Mirror/Bars/Pulse/Steps/Dots/Ribbon/Peaks), waveform size, show transcription, show controls, show history (Box type only), click opens app, lock position, auto-hide with configurable delay.

API Keys

OpenAI API key and Anthropic API key. Both encrypted at rest with Windows DPAPI and excluded from settings export.

Transcription

Provider (Local/Cloud/Cloud AI/Managed), language, streaming transcription, chunk interval, fast mode, whisper mode, model manager.

AI & Text

LLM provider, custom system prompt, custom vocabulary, snippets, context-aware formatting toggle, context profiles.

Text Processing

Output Target: Clipboard (default), Obsidian, or Notion. Clipboard copies to the Windows clipboard after each dictation. Obsidian appends to a vault note (configure vault path and note path, supports {{date}}/{{time}} in the note path). Notion appends a paragraph block to a shared page (API key + page ID required). Output Template: controls the format written for each dictation using {text}, {date}, {time} placeholders; optional timestamp prefix.

Voice Profiles

Create, rename, delete, and switch profiles. Each profile has isolated settings, vocabulary, history, snippets, and stats. The active profile name is shown at the top of the tab. Switching profiles restarts the app.

Advanced

Provider failover, retry transient errors, local-only mode, auto-delete recordings (by age or immediately after transcription), export/import settings, clear all data, telemetry (opt-out), update channel (Stable/Beta/Canary), cloud sync. Local API: enable/disable the local HTTP API server on port 9847 — required for the VS Code extension; Dikt must be running for the extension to connect.

Privacy & Data

What Stays Local

Audio recordings (unless cloud mode), transcription history, settings, API keys (DPAPI-encrypted).

Telemetry

Optional anonymous usage data: machine hash, app version, OS, transcription count, provider. No audio, text, or personal data. Opt out in Settings > Advanced.

Data Locations

%APPDATA%\Dikt\settings.json — Settings
%APPDATA%\Dikt\recordings\ — Audio recordings
%APPDATA%\Dikt\history.json — Transcription history
%APPDATA%\Dikt\logs\ — Application logs

Use Local-Only Mode (Settings > Advanced) to prevent all network communication. Use Clear All Data to delete all history, recordings, and reset settings.

System Tray & Updates

System Tray

Left-click the tray icon to show the main window. Right-click for: Show Window, Compact Mode (small floating status window), Exit.

Updates

Dikt auto-checks for updates and shows a notification bar when available. Choose your channel in Settings > Advanced: Stable (default), Beta, or Canary. After updating, a "What's New" dialog shows release notes.

Getting Help

Use the in-app support form (Settings > Support) to submit tickets with optional log and settings attachments. Pro subscribers also have access to the AI Assistant for instant help. Visit www.dikt.app/support for additional resources.

Back to Documentation

User Manual

Getting Started

System Requirements

Installation

Setup Wizard

Quick Start

Recording & Dictation

Hotkey Modes

During Recording

Push-to-Talk Grace Period

Sound Effects

Transcription Modes

Whisper Models

Text Output

Voice Commands

Editing Voice Commands

AI Features

AI Text Cleanup

AI Command Mode

Custom Vocabulary

Snippets

Context-Aware Formatting

Profanity Filter & Swear Jar

Custom AI Personas

Multi-Turn AI Command Mode

Real-Time Translation

Word Correction Training

Markdown Voice Mode

Advanced Features

Fast Mode

Whisper Mode

Streaming Transcription

Voice Profiles

Batch Transcription

Multi-Language

GPU Acceleration

Noise Profile Learning

Auto-Correction Suggestions

Code Dictation Mode

TTS Preview

Clipboard Re-Inject Queue

Wake Word Detection

Integrations

Obsidian Integration

Notion Integration

VS Code Extension

Output Templates

History & Analytics

Transcription History

Export

Dashboard

Dictation Streaks

Account & Licensing

Activating a License

Web Dashboard

Cloud Sync

Team Workspaces

Settings Reference

General

Overlay

API Keys

Transcription

AI & Text

Text Processing

Voice Profiles

Advanced

Privacy & Data

What Stays Local

Telemetry

Data Locations

System Tray & Updates

System Tray

Updates

Getting Help

Stay in the loop