API Reference¶
This section documents the public Python API for MLX Audio. The library is organized into several top-level modules:
| Module | Description |
|---|---|
mlx_audio.tts |
Text-to-Speech -- model loading, generation, and utilities |
mlx_audio.stt |
Speech-to-Text -- model loading and transcription |
mlx_audio.audio_io |
Audio I/O -- reading and writing audio files |
Quick Links¶
Loading Models¶
# TTS
from mlx_audio.tts.utils import load, load_model
# STT
from mlx_audio.stt.utils import load, load_model
Generating Audio¶
Reading and Writing Files¶
Conventions¶
- All models are loaded from HuggingFace repos or local paths via
load()/load_model(). - TTS
generate()methods return iterators ofGenerationResultdataclass objects. - Audio waveforms are represented as
mx.array(MLX) ornp.ndarray(NumPy) depending on the context. - Sample rates are always in Hz and accompany the audio data in result objects.