Parakeet¶
Parakeet is NVIDIA's high-accuracy speech-to-text model. Parakeet v2 focuses on English, while Parakeet v3 supports 25 European languages.
Available Models¶
| Model | Languages | Description | Repo |
|---|---|---|---|
| Parakeet v2 | English | English-only, high accuracy | mlx-community/parakeet-tdt-0.6b-v2 |
| Parakeet v3 | 25 EU languages | Multilingual European | mlx-community/parakeet-tdt-0.6b-v3 |
Supported Languages (v3)¶
Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Russian, Ukrainian
Python Usage¶
Basic Transcription¶
from mlx_audio.stt.utils import load
# Load the multilingual v3 model
model = load("mlx-community/parakeet-tdt-0.6b-v3")
# Transcribe audio
result = model.generate("audio.wav")
print(f"Text: {result.text}")
Sentence and Word Timestamps¶
result = model.generate("audio.wav")
for sentence in result.sentences:
print(f"[{sentence.start:.2f}s - {sentence.end:.2f}s] {sentence.text}")
Streaming Transcription¶
CLI Usage¶
v2 vs v3
Use v2 for English-only workloads where you want the best English accuracy. Use v3 when you need multilingual European language support.