Resonant
Back to resources
GuideMar 3, 2026
Share

Which Transcription Model Should You Use in Resonant?

Resonant ships with ten different speech-to-text models. They all run locally on your Mac — no cloud, no account, nothing leaves the machine. But they are not all the same, and picking the right one makes a real difference.

Some are tuned for accuracy in English. Some are built for speed. Some are purpose-built for Mandarin, Japanese, Korean, or Russian. One covers over 1,600 languages. Here's what each one is and when to use it.

Start here: Parakeet TDT 0.6B v3

Best for: English and European languages. Recommended for most people.
Size: ~640 MB — Languages: English + 24 European languages

Parakeet is the default model in Resonant for a reason. It was built by NVIDIA on NeMo FastConformer architecture, trained on over 660,000 hours of audio, and it shows. Word error rates on English, German, Spanish, Italian, and French are among the lowest of any locally-runnable model.

It auto-detects language across 25 European languages — you don't need to tell it what you're speaking. German comes in at 5.04% WER on FLEURS benchmarks. Spanish at 3.45%. Italian at 3.00%. For English dictation specifically, it's hard to beat at any price.

Parakeet also supports hotwords: you can bias it toward proper nouns, technical terms, or product names that matter to your work. If you dictate anything involving names, jargon, or specialized vocabulary, that feature alone makes it the right starting point.

If you speak English or any Western European language and you're not sure which model to use, use Parakeet. Switch only if you have a specific reason.

For lightweight English use: Moonshine v2 Medium

Best for: English-only workflows where you want a smaller footprint.
Size: ~200 MB — Languages: English only

Moonshine was built by Useful Sensors specifically for edge devices. The v2 Medium model runs at 245M parameters — about a third the size of Parakeet — and delivers accuracy that holds up well for everyday dictation. 6.65% word error rate on standard benchmarks.

The key difference from Whisper-based models is architecture. Moonshine doesn't pad audio to fixed 30-second chunks like Whisper does, which means shorter utterances process without unnecessary overhead. It was designed to be efficient, and on Apple Silicon that efficiency is noticeable.

Choose Moonshine v2 Medium if you dictate in English, you want to keep your model download small, and you don't need hotwords or multilingual support.

For 99 languages: Whisper Large V3 Turbo

Best for: Any language not served by a dedicated model above.
Size: ~1 GB — Languages: 99 languages

OpenAI's Whisper is the model that proved high-accuracy offline transcription was possible at scale. Whisper Large V3 Turbo is a distilled version — 809M parameters with only 4 decoder layers instead of the full 32 — which makes it significantly faster while keeping the broad language support intact.

If you dictate in Arabic, Hindi, Vietnamese, Thai, Hebrew, Turkish, Indonesian, or any of the other 80+ languages that Parakeet and SenseVoice don't cover, Whisper Turbo is your model. It defaults to English but you can point it at any of the 99 supported languages in Resonant's settings.

It's the most widely trusted offline transcription model in the world. If you're evaluating Resonant against another tool that uses Whisper in the cloud, this is what that tool is running — except yours runs locally.

For faster English Whisper: Whisper Distil Large v3.5

Best for: English-only Whisper users who want more speed.
Size: ~1 GB — Languages: English only

Distil-Whisper is a knowledge-distilled version of Whisper Large V3, trained specifically on English. The accuracy on English short-form audio is within about 1% of the full Turbo model, but it runs 1.5x faster.

If your work is English-only and you find yourself transcribing frequently throughout the day, that speed difference adds up. The download size is comparable to Whisper Turbo, so the only tradeoff is losing multilingual support you may not need.

For East Asian languages: SenseVoice Small

Best for: Chinese, Japanese, Korean, and Cantonese.
Size: ~226 MB — Languages: Mandarin, English, Japanese, Korean, Cantonese

SenseVoice was built by Alibaba Research for exactly this use case. It uses a non-autoregressive CTC architecture that processes audio at roughly one-tenth real-time — RTF of 0.10, which means a ten-second clip takes about one second to transcribe. That's very fast.

It auto-detects across its five supported languages, so mixed-language dictation between, say, Mandarin and English works without switching modes. At 226 MB, the download is light for what it covers.

If you regularly switch between English and any East Asian language, SenseVoice is the right choice. For Mandarin-primary users who need the highest possible accuracy on Chinese, consider FireRedASR Large instead (below).

For Mandarin accuracy: FireRedASR Large

Best for: Mandarin-primary speakers who prioritize accuracy above all else.
Size: ~1.7 GB — Languages: Mandarin Chinese + English

FireRedASR Large is the best locally-runnable Mandarin model available. Built by the FireRed team on an attention encoder-decoder architecture, it achieves 3.18% character error rate on Mandarin benchmarks — state of the art for an offline model. It also handles Chinese dialects and code-switching between Mandarin and English.

The download is larger at 1.7 GB, and it runs somewhat slower than SenseVoice. But if your primary use is professional Mandarin dictation — documents, correspondence, clinical notes — the accuracy difference is worth it.

For Japanese: Zipformer Japanese

Best for: Japanese-only speakers who want the highest accuracy.
Size: ~148 MB — Languages: Japanese only

This model was trained on 35,000 hours of ReazonSpeech v2.0 data — one of the largest Japanese speech corpora publicly available. The Icefall Zipformer architecture runs at RTF 0.08, which means real-time-or-better transcription on any modern Mac.

SenseVoice covers Japanese, but if Japanese is your primary or only language, this dedicated model will generally outperform it. 148 MB is a compact download for what it delivers.

For Korean: Zipformer Korean

Best for: Korean speakers.
Size: ~68 MB — Languages: Korean only

At 68 MB, Zipformer Korean is the smallest model in Resonant. It runs at 29x faster than real-time (RTF 0.034) — the fastest model in the lineup. A minute of speech processes in about two seconds on Apple Silicon.

If you dictate in Korean, this is the right choice. The download is negligible and the speed is unmatched.

For Russian: GigaAM v2 Russian

Best for: Russian-primary speakers.
Size: ~231 MB — Languages: Russian only

GigaAM v2 comes from SaluteSpeech, Sber's speech AI research team. It's commercially licensed and uses a NeMo transducer architecture trained specifically on Russian speech. It's the best locally-runnable Russian ASR model available.

If Russian is your primary dictation language, this model will significantly outperform Whisper Turbo on Russian content. 231 MB is compact for the coverage it provides.

For everything else: omniASR 300M

Best for: Any language not covered above.
Size: ~348 MB — Languages: 1,600+ languages

Meta's omniASR is a CTC model trained across over 1,600 languages. If you speak a language that isn't served by any of the dedicated models above and isn't in Whisper's 99-language set, omniASR is your option.

It covers many low-resource languages that no other local model does. Accuracy on well-represented languages is good; on lower-resource ones it varies, as it does with any model trained on limited data. But for languages with no other local option, it's a meaningful baseline.

Quick reference

ModelSizeBest for
Parakeet TDT 0.6B v3640 MBEnglish + 24 European languages. Start here.
Moonshine v2 Medium200 MBEnglish only. Lightweight alternative to Parakeet.
Whisper Large V3 Turbo1 GB99 languages. Use for languages not covered above.
Whisper Distil Large v3.51 GBEnglish only. Faster than Turbo, same accuracy.
SenseVoice Small226 MBChinese, Japanese, Korean, Cantonese, English.
FireRedASR Large1.7 GBMandarin-primary. Best Chinese accuracy.
Zipformer Japanese148 MBJapanese-only. Trained on 35k hours.
Zipformer Korean68 MBKorean-only. Smallest and fastest model.
GigaAM v2 Russian231 MBRussian-only. Best Russian accuracy.
omniASR 300M348 MB1,600+ languages. Universal fallback.

How to switch models

Open Resonant's settings, go to the Transcription tab, and select a model from the list. Models that aren't downloaded yet will show a download button. The download happens in the background; Resonant will continue working with your current model while the new one downloads.

You can switch models at any time without restarting the app. If you're unsure where to start, Parakeet is already selected by default. Try it for a week. If you find yourself dictating in a language it doesn't cover, or if you want to compare accuracy on specific content, download a second model and switch between them.

All of this runs on your Mac. No account required. No audio sent anywhere. The model files live in your home directory and are yours to keep.

Download Resonant and pick your model.

Share