KugelAudio¶
KugelAudio is an open-weight 7B text-to-speech model for 24 European languages. The MLX Audio integration runs the original weights directly and exposes the standard generate() interface.
Model¶
| Model | Precision | HuggingFace |
|---|---|---|
kugelaudio/kugelaudio-0-open |
bfloat16 | Model Card |
Usage¶
Parameters¶
| Parameter | Default | Description |
|---|---|---|
cfg_scale |
3.0 |
Classifier-free guidance strength |
ddpm_steps |
10 |
Diffusion steps for quality vs speed |
max_tokens |
2048 |
Maximum speech tokens to generate |
Languages¶
English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Czech, Romanian, Hungarian, Swedish, Danish, Finnish, Norwegian, Greek, Bulgarian, Slovak, Croatian, Serbian, and Turkish.
Notes¶
- KugelAudio uses a hybrid autoregressive plus diffusion pipeline derived from VibeVoice.
- The upstream release does not ship voice presets; generation uses the default voice path.
- Expect roughly 17 GB of unified memory for the full bfloat16 weights.