Qwen2-Audio¶
Qwen2-Audio is a multimodal audio-language model that handles more than transcription. In MLX Audio it can be used for ASR, translation, captioning, emotion recognition, and general audio understanding through text prompts.
Models¶
| Model | Quantization | HuggingFace |
|---|---|---|
Qwen/Qwen2-Audio-7B-Instruct |
bf16 | Model Card |
mlx-community/Qwen2-Audio-7B-Instruct-4bit |
4-bit | Model Card |
Usage¶
Typical Prompts¶
Transcribe the audio.Translate the speech to French.What emotion is the speaker expressing?Describe the environmental sounds in this clip.
Capabilities¶
- Speech transcription
- Speech translation
- Audio captioning
- Emotion and sentiment analysis
- Environmental sound classification
- General audio understanding