TTS API Reference¶

Model Loading¶

The primary entry points for loading TTS models.

`mlx_audio.tts.utils`¶

Example:

from mlx_audio.tts import load

model = load("mlx-community/outetts-0.3-500M-bf16")
audio = model.generate("Hello world!")

`mlx_audio.tts.utils.load(model_path, lazy=False, strict=True, **kwargs)` ¶

Load a text-to-speech model from a local path or HuggingFace repository.

This is the main entry point for loading TTS models. It automatically detects the model type and initializes the appropriate model class.

Parameters:

Name	Type	Description	Default
`model_path`	`Union[str, Path]`	The local path or HuggingFace repo ID to load from.	required
`lazy`	`bool`	If False, evaluate model parameters immediately.	`False`
`strict`	`bool`	If True, raise an error if any weights are missing.	`True`
`**kwargs`	`Any`	Additional keyword arguments such as `revision` and `force_download`.	`{}`

Returns:

Type	Description
`Module`	nn.Module: The loaded and initialized model.

`mlx_audio.tts.utils.load_model(model_path, lazy=False, strict=True, **kwargs)` ¶

Load and initialize the model from a given path.

Parameters:

Name	Type	Description	Default
`model_path`	`Path`	The path to load the model from.	required
`lazy`	`bool`	If False eval the model parameters to make sure they are loaded in memory before returning, otherwise they will be loaded when needed. Default: `False`	`False`

Returns:

Type	Description
`Module`	nn.Module: The loaded and initialized model.

Raises:

Type	Description
`FileNotFoundError`	If the weight files (.safetensors) are not found.
`ValueError`	If the model class or args class are not found or cannot be instantiated.

`mlx_audio.tts.utils.get_available_models()` ¶

Get a list of all available TTS model types by scanning the models directory.

Returns:

Type	Description
`List[str]`	List[str]: A list of available model type names

`mlx_audio.tts.utils.get_model_and_args(model_type, model_name)` ¶

Retrieve the model architecture module based on the model type and name.

This function attempts to find the appropriate model architecture by: 1. Checking if the model_type is directly in the MODEL_REMAPPING dictionary 2. Looking for partial matches in segments of the model_name

Parameters:

Name	Type	Description	Default
`model_type`	`str`	The type of model to load (e.g., "outetts").	required
`model_name`	`List[str]`	List of model name components that might contain remapping information.	required

Returns:

Type	Description
`Tuple[Any, str]`	Tuple[module, str]: A tuple containing: - The imported architecture module - The resolved model_type string after remapping

Raises:

Type	Description
`ValueError`	If the model type is not supported (module import fails).

`mlx_audio.tts.utils.fetch_from_hub(model_path, lazy=False, **kwargs)` ¶

`mlx_audio.tts.utils.upload_to_hub(path, upload_repo, hf_path)` ¶

Uploads the model to Hugging Face hub.

Parameters:

Name	Type	Description	Default
`path`	`str`	Local path to the model.	required
`upload_repo`	`str`	Name of the HF repo to upload to.	required
`hf_path`	`str`	Path to the original Hugging Face model.	required

Audio Generation¶

`mlx_audio.tts.generate`¶

`mlx_audio.tts.generate.generate_audio(text, model=None, max_tokens=1200, voice='af_heart', prompt=None, instruct=None, speed=1.0, lang_code='en', cfg_scale=None, ddpm_steps=None, sigma=None, ref_audio=None, ref_text=None, stt_model='mlx-community/whisper-large-v3-turbo-asr-fp16', output_path=None, file_prefix='audio', audio_format='wav', join_audio=False, play=False, verbose=True, temperature=0.7, stream=False, streaming_interval=2.0, save=False, use_zero_spk_emb=False, **kwargs)` ¶

Generates audio from text using a specified TTS model.

Parameters: - text (str): The input text to be converted to speech. - model (str): The TTS model to use. - voice (str): The voice style to use (also used as speaker for Qwen3-TTS models). - instruct (str): Instruction for emotion/style (CustomVoice) or voice description (VoiceDesign). - temperature (float): The temperature for the model. - speed (float): Playback speed multiplier. - lang_code (str): The language code. - ref_audio (mx.array): Reference audio you would like to clone the voice from. - ref_text (str): Caption for reference audio. - stt_model_path (str): A mlx whisper model to use to transcribe. - output_path (str): Directory path where audio files will be saved. - file_prefix (str): The output file path without extension. - audio_format (str): Output audio format (e.g., "wav", "flac"). - join_audio (bool): Whether to join multiple audio files into one. - play (bool): Whether to play the generated audio. - verbose (bool): Whether to print status messages. - save (bool): Whether to save streamed audio to a file when using stream mode. - model (object): A already loaded model. - stt_model (object): A already loaded stt model. Returns: - None: The function writes the generated audio to a file when not streaming, or when streaming with saving enabled.

TTS API Reference¶

Model Loading¶

`mlx_audio.tts.utils`¶

`mlx_audio.tts.utils.load(model_path, lazy=False, strict=True, **kwargs)` ¶

`mlx_audio.tts.utils.load_model(model_path, lazy=False, strict=True, **kwargs)` ¶

`mlx_audio.tts.utils.get_available_models()` ¶

`mlx_audio.tts.utils.get_model_and_args(model_type, model_name)` ¶

`mlx_audio.tts.utils.fetch_from_hub(model_path, lazy=False, **kwargs)` ¶

`mlx_audio.tts.utils.upload_to_hub(path, upload_repo, hf_path)` ¶

Audio Generation¶

`mlx_audio.tts.generate`¶

Data Classes¶

`mlx_audio.tts.models.base`¶

`mlx_audio.tts.models.base.GenerationResult` `dataclass` ¶

`audio` `instance-attribute` ¶

`samples` `instance-attribute` ¶

`sample_rate` `instance-attribute` ¶

`segment_idx` `instance-attribute` ¶

`token_count` `instance-attribute` ¶

`audio_duration` `instance-attribute` ¶

`real_time_factor` `instance-attribute` ¶

`processing_time_seconds` `instance-attribute` ¶

`peak_memory_usage` `instance-attribute` ¶

`is_streaming_chunk = False` `class-attribute` `instance-attribute` ¶

`is_final_chunk = False` `class-attribute` `instance-attribute` ¶

`mlx_audio.tts.models.base.BatchGenerationResult` `dataclass` ¶

`mlx_audio.tts.models.base.BaseModelArgs` `dataclass` ¶

TTS API Reference¶

Model Loading¶

mlx_audio.tts.utils¶

mlx_audio.tts.utils.load(model_path, lazy=False, strict=True, **kwargs) ¶

mlx_audio.tts.utils.load_model(model_path, lazy=False, strict=True, **kwargs) ¶

mlx_audio.tts.utils.get_available_models() ¶

mlx_audio.tts.utils.get_model_and_args(model_type, model_name) ¶

mlx_audio.tts.utils.fetch_from_hub(model_path, lazy=False, **kwargs) ¶

mlx_audio.tts.utils.upload_to_hub(path, upload_repo, hf_path) ¶

Audio Generation¶

mlx_audio.tts.generate¶

Data Classes¶

mlx_audio.tts.models.base¶

mlx_audio.tts.models.base.GenerationResult dataclass ¶

audio instance-attribute ¶

samples instance-attribute ¶

sample_rate instance-attribute ¶

segment_idx instance-attribute ¶

token_count instance-attribute ¶

audio_duration instance-attribute ¶

real_time_factor instance-attribute ¶

processing_time_seconds instance-attribute ¶

peak_memory_usage instance-attribute ¶

is_streaming_chunk = False class-attribute instance-attribute ¶

is_final_chunk = False class-attribute instance-attribute ¶

mlx_audio.tts.models.base.BatchGenerationResult dataclass ¶

mlx_audio.tts.models.base.BaseModelArgs dataclass ¶

`mlx_audio.tts.utils`¶

`mlx_audio.tts.utils.load(model_path, lazy=False, strict=True, **kwargs)` ¶

`mlx_audio.tts.utils.load_model(model_path, lazy=False, strict=True, **kwargs)` ¶

`mlx_audio.tts.utils.get_available_models()` ¶

`mlx_audio.tts.utils.get_model_and_args(model_type, model_name)` ¶

`mlx_audio.tts.utils.fetch_from_hub(model_path, lazy=False, **kwargs)` ¶

`mlx_audio.tts.utils.upload_to_hub(path, upload_repo, hf_path)` ¶

`mlx_audio.tts.generate`¶

`mlx_audio.tts.models.base`¶

`mlx_audio.tts.models.base.GenerationResult` `dataclass` ¶

`audio` `instance-attribute` ¶

`samples` `instance-attribute` ¶

`sample_rate` `instance-attribute` ¶

`segment_idx` `instance-attribute` ¶

`token_count` `instance-attribute` ¶

`audio_duration` `instance-attribute` ¶

`real_time_factor` `instance-attribute` ¶

`processing_time_seconds` `instance-attribute` ¶

`peak_memory_usage` `instance-attribute` ¶

`is_streaming_chunk = False` `class-attribute` `instance-attribute` ¶

`is_final_chunk = False` `class-attribute` `instance-attribute` ¶

`mlx_audio.tts.models.base.BatchGenerationResult` `dataclass` ¶

`mlx_audio.tts.models.base.BaseModelArgs` `dataclass` ¶