Skip to content

Voxtral TTS

Mistral's 4B parameter multilingual text-to-speech model with 20 expressive voice presets across 9 languages. Based on mistralai/Voxtral-4B-TTS-2603.

Model Variants

Model Format HuggingFace
mlx-community/Voxtral-4B-TTS-2603-mlx-bf16 bfloat16 Model Card

Usage

python -m mlx_audio.tts.generate \
    --model mlx-community/Voxtral-4B-TTS-2603-mlx-bf16 \
    --text "Hello, how are you today?" \
    --voice casual_male
from mlx_audio.tts.utils import load

model = load("mlx-community/Voxtral-4B-TTS-2603-mlx-bf16")

for result in model.generate(text="Hello, how are you today?", voice="casual_male"):
    print(result.audio_duration)

Streaming

Voxtral TTS supports chunked streaming output for lower-latency playback.

python -m mlx_audio.tts.generate \
    --model mlx-community/Voxtral-4B-TTS-2603-mlx-bf16 \
    --text "Streaming speech from Voxtral TTS." \
    --voice casual_male \
    --stream \
    --streaming_interval 1.5 \
    --play
from mlx_audio.tts.utils import load

model = load("mlx-community/Voxtral-4B-TTS-2603-mlx-bf16")

for result in model.generate(
    text="Streaming speech from Voxtral TTS.",
    voice="casual_male",
    stream=True,
    streaming_interval=1.5,
):
    print(result.is_streaming_chunk, result.is_final_chunk)

Available Voices

English

Voice Style
casual_male Casual
casual_female Casual
cheerful_female Cheerful
neutral_male Neutral
neutral_female Neutral

Multilingual

Voice Language
fr_male, fr_female French
es_male, es_female Spanish
de_male, de_female German
it_male, it_female Italian
pt_male, pt_female Portuguese
nl_male, nl_female Dutch
ar_male Arabic
hi_male, hi_female Hindi

Supported Languages

English, French, Spanish, German, Italian, Portuguese, Dutch, Arabic, Hindi.

License

Voxtral TTS weights are released under CC-BY-NC (non-commercial use). Check the model card for full licensing details.