MLX Audio¶
Fast, efficient audio processing on Apple Silicon.
MLX Audio is the premier audio library built on Apple's MLX framework, delivering high-performance text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) on M-series chips.
-
Text-to-Speech
Generate natural speech with models like Kokoro, Qwen3-TTS, Voxtral TTS, CSM, Dia, and more. Multilingual support, voice cloning, and speed control.
-
Speech-to-Text
Transcribe audio with Whisper, Parakeet, Voxtral Realtime, Qwen3-ASR, VibeVoice, and more. Streaming support and word-level timestamps.
-
Speech-to-Speech
Source separation with SAM-Audio, speech enhancement with MossFormer2 and DeepFilterNet, and conversational AI with Liquid2.5-Audio.
-
Optimized for Apple Silicon
Built on MLX for native M1/M2/M3/M4 acceleration. Quantization support (3-bit to 8-bit) for smaller models and faster inference.
-
OpenAI-Compatible API
Drop-in REST API server with a modern web UI featuring 3D audio visualization. Compatible with existing OpenAI client libraries.
-
Swift / iOS Support
On-device TTS for macOS and iOS via the companion mlx-audio-swift package.
Get Started¶
- Installation -- pip, uv, and development setup
- CLI Quick Start -- Generate and transcribe from the command line
- Python Quick Start -- Use mlx-audio in your Python projects
- All Models -- Browse every supported model