
Model Comparison
Compare Whisper models from tiny to large-v3: download sizes, accuracy, and speed.
Dikt uses Whisper.cpp for local transcription. Models range from tiny (fast, basic) to large-v3 (slow, most accurate). Speed estimates are for a 30-second audio clip on a modern CPU.
| Model | Parameters | Download | Speed | Accuracy | Multi-Language | Notes |
|---|---|---|---|---|---|---|
| tiny | 39M | ~75 MB | ~1s | Basic | Limited | Fastest, lowest resource usage |
| base | 74M | ~142 MB | ~2s | Good | Good | Good balance for quick tasks |
| small | 244M | ~466 MB | ~5s | Very Good | Very Good | Recommended for most users |
| medium | 769M | ~1.5 GB | ~12s | Excellent | Excellent | High accuracy, needs more RAM |
| large-v3-turbo | 809M | ~1.5 GB | ~8s | Best | Best | Best accuracy-to-speed ratio, multi-language |
| large-v3 | 1550M | ~2.9 GB | ~25s | Best | Best | Maximum accuracy, all languages |
How to Download Models
Open Dikt, go to Settings > Model Manager. Select the model you want and click Download. Models are cached locally and only need to be downloaded once.