Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.
| Date | Source | Description | Paper | Code | Trained Model |
|---|---|---|---|---|---|
| 06.12 | JAMMIN-GPT | JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live | arXiv | GitHub | - |
| 19.11 | M2UGen | M2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models | arXiv | - | - |
| 14.11 | Qwen-Audio | Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | arXiv | GitHub | - |
| 02.11 | FLAP | FLAP: Fast Language-Audio Pre-training | arXiv | - | - |
| 29.10 | JEN-1 Composer | JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation | arXiv | - | - |
| 20.10 | SALMONN | SALMONN: Towards Generic Hearing Abilities for Large Language Models | arXiv | GitHub | Hugging Face |
| 19.10 | Loop Copilot | Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing | arXiv | - | - |
| 18.10 | MusicAgent | MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | arXiv | GitHub | - |
| 11.10 | LLark | LLark: A Multimodal Foundation Model for Music | arXiv | GitHub | - |
| 01.10 | UniAudio | UniAudio: An Audio Foundation Model Toward Universal Audio Generation | arXiv | GitHub | - |
| 18.09 | Dynamic-SUPERB | Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech | arXiv | GitHub | - |