Higgs Audio v3 TTS¶
Higgs Audio v3 TTS is a Qwen3-backed conversational TTS model with fused multi-codebook audio token generation, inline control tokens, multilingual speech, and zero-shot voice cloning.
python -m mlx_audio.tts.generate \
--model bosonai/higgs-audio-v3-tts-4b \
--text "Hello from Higgs Audio v3 on MLX."
Voice cloning¶
Pass one or more reference clips with matching transcripts:
python -m mlx_audio.tts.generate \
--model bosonai/higgs-audio-v3-tts-4b \
--text "Have a nice day and enjoy the sunshine." \
--ref_audio reference.wav \
--ref_text "Reference transcript."
Multiple references use repeated CLI flags:
python -m mlx_audio.tts.generate \
--model bosonai/higgs-audio-v3-tts-4b \
--text "Let's keep the same voice across this line." \
--ref_audio speaker_1.wav \
--ref_text "First reference transcript." \
--ref_audio speaker_2.wav \
--ref_text "Second reference transcript."
Python¶
from mlx_audio.tts.utils import load
from mlx_audio.audio_io import write as audio_write
model = load("bosonai/higgs-audio-v3-tts-4b")
for result in model.generate(
text="Hello from Higgs Audio v3 on MLX.",
ref_audio="reference.wav",
ref_text="Reference transcript.",
temperature=1.0,
max_new_tokens=2048,
):
audio_write("output.wav", result.audio, result.sample_rate)
Controls¶
Inline control tokens from the upstream model can be placed directly in the input text, for example:
For sound-effect tags, follow the upstream guidance and include matching written onomatopoeia after the tag.
Notes¶
- The model is released under the Boson Higgs Audio v3 Research and Non-Commercial License. See the original model card and license: https://huggingface.co/bosonai/higgs-audio-v3-tts-4b