Skip to content

MLX Audio

Fast, efficient audio processing on Apple Silicon.

MLX Audio is the premier audio library built on Apple's MLX framework, delivering high-performance text-to-speech (TTS), speech-to-text (STT), and speech-to-speech (STS) on M-series chips.


  • Text-to-Speech


    Generate natural speech with models like Kokoro, Qwen3-TTS, Voxtral TTS, CSM, Dia, and more. Multilingual support, voice cloning, and speed control.

    TTS Models

  • Speech-to-Text


    Transcribe audio with Whisper, Parakeet, Voxtral Realtime, Qwen3-ASR, VibeVoice, and more. Streaming support and word-level timestamps.

    STT Models

  • Speech-to-Speech


    Source separation with SAM-Audio, speech enhancement with MossFormer2 and DeepFilterNet, and conversational AI with Liquid2.5-Audio.

    STS Models

  • Optimized for Apple Silicon


    Built on MLX for native M1/M2/M3/M4 acceleration. Quantization support (3-bit to 8-bit) for smaller models and faster inference.

    Quantization Guide

  • OpenAI-Compatible API


    Drop-in REST API server with a modern web UI featuring 3D audio visualization. Compatible with existing OpenAI client libraries.

    Web UI & API Guide

  • Swift / iOS Support


    On-device TTS for macOS and iOS via the companion mlx-audio-swift package.

    Swift Package


Get Started