π Project Page | π Arxiv |
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
Xu Guo * , Fulong Ye * , Qichao Sun *β , Liyang Chen, Bingchuan Li β , Pengze Zhang, Jiawei Liu, Songtao Zhao Β§, Qian He, Xiangwang Hou Β§
* Equal contribution, β Project lead, Β§ Corresponding author
Tsinghua University | Intelligent Creation Team, ByteDance
DreamID-Omni is a unified framework designed for high-fidelity human-centric generation. It seamlessly integrates three core capabilities into a single model:
- R2AV (Generation): Generate synchronized video and audio from reference images and voice timbres.
- RV2AV (Editing): Edit the identity and voice of a source video based on the reference image and voice timbre.
- RA2V (Animation): Animate a reference identity driven by audio input with precise lip-sync.
demo.mp4
We are currently making final preparations for the open-source release. Pending internal company approval, we aim to release the v1 version in March. Please stay tuned!
