MLX Audio¶

Fast, efficient audio processing on Apple Silicon.

MLX Audio is the premier audio library built on Apple's MLX framework, delivering high-performance text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) on M-series chips.

Text-to-Speech

Generate natural speech with models like Kokoro, Qwen3-TTS, Voxtral TTS, CSM, Dia, and more. Multilingual support, voice cloning, and speed control.

TTS Models
Speech-to-Text

Transcribe audio with Whisper, Parakeet, Voxtral Realtime, Qwen3-ASR, VibeVoice, and more. Streaming support and word-level timestamps.

STT Models
Speech-to-Speech

Source separation with SAM-Audio, speech enhancement with MossFormer2 and DeepFilterNet, and conversational AI with Liquid2.5-Audio.

STS Models
Optimized for Apple Silicon

Built on MLX for native M1/M2/M3/M4 acceleration. Quantization support (3-bit to 8-bit) for smaller models and faster inference.

Quantization Guide
OpenAI-Compatible API

Drop-in REST API server with a modern web UI featuring 3D audio visualization. Compatible with existing OpenAI client libraries.

Web UI & API Guide
Swift / iOS Support

On-device TTS for macOS and iOS via the companion mlx-audio-swift package.

Swift Package

Get Started¶

Installation -- pip, uv, and development setup
CLI Quick Start -- Generate and transcribe from the command line
Python Quick Start -- Use mlx-audio in your Python projects
All Models -- Browse every supported model