Conversation
LTX 2.0 Vocoder Implementation
LTX 2.0 Video VAE Implementation
* Initial implementation of LTX 2.0 latent upsampling pipeline * Add new LTX 2.0 spatial latent upsampler logic * Add test script for LTX 2.0 latent upsampling * Add option to enable VAE tiling in upsampling test script * Get latent upsampler working with video latents * Fix typo in BlurDownsample * Add latent upsample pipeline docstring and example * Remove deprecated pipeline VAE slicing/tiling methods * make style and make quality * When returning latents, return unpacked and denormalized latents for T2V and I2V * Add model_cpu_offload_seq for latent upsampling pipeline --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com>
|
Merging as the CI failures are unrelated. |
In this example, an `image is not being passed to the pipeline. Should be: |
What does this PR do?
This PR adds pipelines for the LTX 2.0 video generation model (code, weights). LTX 2.0 is an audio-video foundation model that generates videos with synced audio; it supports generation tasks such as text-to-video (T2V), text-image-to-video (TI2V), and more.
An example usage script for I2V is as follows:
Note that LTX 2.0 video generation uses a lot of memory; it is necessary to use CPU offloading even for an A100 which has 80 GB VRAM (assuming no other memory optimizations other than
bf16inference are used).Here is an I2V sample from the above:
ltx2_i2v_sample.mp4
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu
@sayakpaul
@ofirbb