Dia¶
Dia is a 1.6B parameter dialogue-focused TTS model. It natively supports multi-speaker conversations using [S1] and [S2] speaker tags, making it ideal for generating realistic dialogue audio.
Model Variants¶
| Model | Format | HuggingFace |
|---|---|---|
mlx-community/Dia-1.6B-fp16 |
float16 | Model Card |
Usage¶
Basic Dialogue Generation¶
Multi-Turn Dialogue¶
Dia automatically splits text on [S1]/[S2] tags and generates each turn separately:
from mlx_audio.tts.utils import load_model
model = load_model("mlx-community/Dia-1.6B-fp16")
dialogue = """[S1] Welcome to the show! Today we're talking about AI on Apple Silicon.
[S2] Thanks for having me. It's an exciting time for on-device inference.
[S1] Absolutely. What's been the biggest breakthrough?
[S2] I'd say the combination of unified memory and optimized frameworks like MLX."""
for result in model.generate(text=dialogue):
audio = result.audio
With Reference Audio¶
from mlx_audio.tts.utils import load_model
model = load_model("mlx-community/Dia-1.6B-fp16")
for result in model.generate(
text="[S1] Hello, this is a voice cloning test.",
ref_audio="reference.wav",
ref_text="This is a sample of my voice.",
):
audio = result.audio
Generation Parameters¶
| Parameter | Default | Description |
|---|---|---|
temperature |
1.3 |
Sampling temperature |
top_p |
0.95 |
Top-p (nucleus) sampling threshold |
split_pattern |
"\n" |
Pattern to split text into segments |
max_tokens |
None |
Maximum number of tokens to generate |
Dialogue format
Use [S1] and [S2] tags at the start of each speaker's line. Dia will automatically separate turns and generate distinct voices for each speaker.