Add LTX 2.0 Video Pipelines by dg845 · Pull Request #12915 · huggingface/diffusers

dg845 · 2026-01-06T06:28:57Z

What does this PR do?

This PR adds pipelines for the LTX 2.0 video generation model (code, weights). LTX 2.0 is an audio-video foundation model that generates videos with synced audio; it supports generation tasks such as text-to-video (T2V), text-image-to-video (TI2V), and more.

An example usage script for I2V is as follows:

import torch
from diffusers.pipelines.ltx2 import LTX2ImageToVideoPipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.utils import load_image

pipe = LTX2ImageToVideoPipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
prompt = "An astronaut hatches from a fragile egg on the surface of the Moon, the shell cracking and peeling apart in gentle low-gravity motion. Fine lunar dust lifts and drifts outward with each movement, floating in slow arcs before settling back onto the ground. The astronaut pushes free in a deliberate, weightless motion, small fragments of the egg tumbling and spinning through the air. In the background, the deep darkness of space subtly shifts as stars glide with the camera's movement, emphasizing vast depth and scale. The camera performs a smooth, cinematic slow push-in, with natural parallax between the foreground dust, the astronaut, and the distant starfield. Ultra-realistic detail, physically accurate low-gravity motion, cinematic lighting, and a breath-taking, movie-like shot."
negative_prompt = "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static."

frame_rate = 24.0
video, audio = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=768,
    height=512,
    num_frames=121,
    frame_rate=frame_rate,
    num_inference_steps=40,
    guidance_scale=4.0,
    output_type="np",
    return_dict=False,
)
video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)

encode_video(
    video[0],
    fps=frame_rate,
    audio=audio[0].float().cpu(),
    audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
    output_path="ltx2_sample.mp4",
)

Note that LTX 2.0 video generation uses a lot of memory; it is necessary to use CPU offloading even for an A100 which has 80 GB VRAM (assuming no other memory optimizations other than bf16 inference are used).

Here is an I2V sample from the above:

ltx2_i2v_sample.mp4

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu
@sayakpaul
@ofirbb

…ting

…mplementation

LTX 2.0 Vocoder Implementation

LTX 2.0 Video VAE Implementation

* Initial implementation of LTX 2.0 latent upsampling pipeline * Add new LTX 2.0 spatial latent upsampler logic * Add test script for LTX 2.0 latent upsampling * Add option to enable VAE tiling in upsampling test script * Get latent upsampler working with video latents * Fix typo in BlurDownsample * Add latent upsample pipeline docstring and example * Remove deprecated pipeline VAE slicing/tiling methods * make style and make quality * When returning latents, return unpacked and denormalized latents for T2V and I2V * Add model_cpu_offload_seq for latent upsampling pipeline --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com>

yiyixuxu

thanks!

dg845 · 2026-01-08T05:24:09Z

Merging as the CI failures are unrelated.

hannalaguilar · 2026-01-08T12:07:34Z

What does this PR do?

This PR adds pipelines for the LTX 2.0 video generation model (code, weights). LTX 2.0 is an audio-video foundation model that generates videos with synced audio; it supports generation tasks such as text-to-video (T2V), text-image-to-video (TI2V), and more.

An example usage script for I2V is as follows:

import torch
from diffusers.pipelines.ltx2 import LTX2ImageToVideoPipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.utils import load_image

pipe = LTX2ImageToVideoPipeline.from_pretrained("Lightricks/LTX-2", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = "An astronaut hatches from a fragile egg on the surface of the Moon, the shell cracking and peeling apart in gentle low-gravity motion. Fine lunar dust lifts and drifts outward with each movement, floating in slow arcs before settling back onto the ground. The astronaut pushes free in a deliberate, weightless motion, small fragments of the egg tumbling and spinning through the air. In the background, the deep darkness of space subtly shifts as stars glide with the camera's movement, emphasizing vast depth and scale. The camera performs a smooth, cinematic slow push-in, with natural parallax between the foreground dust, the astronaut, and the distant starfield. Ultra-realistic detail, physically accurate low-gravity motion, cinematic lighting, and a breath-taking, movie-like shot."
negative_prompt = "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static."

frame_rate = 24.0
video, audio = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
	width=768,
	height=512,
	num_frames=121,
	frame_rate=frame_rate,
	num_inference_steps=40,
	guidance_scale=4.0,
	output_type="np",
	return_dict=False,
)
video = (video * 255).round().astype("uint8")
video = torch.from_numpy(video)

encode_video(
	video[0],
	fps=frame_rate,
	audio=audio[0].float().cpu(),
	audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
	output_path="ltx2_sample.mp4",
)

Note that LTX 2.0 video generation uses a lot of memory; it is necessary to use CPU offloading even for an A100 which has 80 GB VRAM (assuming no other memory optimizations other than bf16 inference are used).

Here is an I2V sample from the above:

ltx2_i2v_sample.mp4

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@yiyixuxu @sayakpaul @ofirbb

In this example, an `image is not being passed to the pipeline. Should be:

image = load_image("image.png")
video, audio = pipe(
	prompt=prompt,
	negative_prompt=negative_prompt,
        image=image,
	width=768,
	height=512,
	num_frames=121,
	frame_rate=frame_rate,
	num_inference_steps=40,
	guidance_scale=4.0,
	output_type="np",
	return_dict=False,
)

dg845 and others added 30 commits December 12, 2025 07:52

Initial LTX 2.0 transformer implementation

aa602ac

Add tests for LTX 2 transformer model

b3096c3

Get LTX 2 transformer tests working

980591d

Rename LTX 2 compile test class to have LTX2

e100b8f

Remove RoPE debug print statements

780fb61

Get LTX 2 transformer compile tests passing

5765759

Fix LTX 2 transformer shape errors

aeecc4d

Initial script to convert LTX 2 transformer to diffusers

a5f2d2d

Add more LTX 2 transformer audio arguments

d86f89d

Allow LTX 2 transformer to be loaded from local path for conversion

57a8b9c

Improve dummy inputs and add test for LTX 2 transformer consistency

a7bc052

Fix LTX 2 transformer bugs so consistency test passes

bda3ff1

Initial implementation of LTX 2.0 video VAE

269cf7b

Explicitly specify temporal and spatial VAE scale factors when conver…

baf23e2

…ting

Add initial LTX 2.0 video VAE tests

5b950d6

Add initial LTX 2.0 video VAE tests (part 2)

491aae0

Get diffusers implementation on par with official LTX 2.0 video VAE i…

a748975

…mplementation

Initial LTX 2.0 vocoder implementation

c6a11a5

Merge pull request #3 from huggingface/ltx-2-vocoder

8bfeb4a

LTX 2.0 Vocoder Implementation

Merge pull request #2 from huggingface/ltx-2-video-vae

b1cf6ff

LTX 2.0 Video VAE Implementation

Use RMSNorm implementation closer to original for LTX 2.0 video VAE

6c56954

start audio decoder.

b34ddb1

init registration.

f4c2435

up

e54cd6b

simplify and clean up

907896d

up

4904fd6

Initial LTX 2.0 text encoder implementation

0028955

Rough initial LTX 2.0 pipeline implementation

d0f9cda

up

5f0f2a0

up

58257eb

dg845 and others added 3 commits January 7, 2026 06:37

Remove print statement in audio VAE

964f106

up

4dfe509

Merge branch 'main' into ltx-2-transformer

249ae1f

yiyixuxu mentioned this pull request Jan 7, 2026

Any plan on LTX-2? #12920

Closed

Fix bug when calculating audio RoPE coords

040c118

sayakpaul requested a review from yiyixuxu January 7, 2026 12:13

sayakpaul and others added 9 commits January 7, 2026 15:46

Fix latent upsampler filename in LTX 2 conversion script

5e50046

Add latent upsample pipeline to LTX 2 docs

2b85b93

Add dummy objects for LTX 2 latent upsample pipeline

40ee3e3

Set default FPS to official LTX 2 ckpt default of 24.0

99ff722

Set default CFG scale to official LTX 2 ckpt default of 4.0

165b945

Update LTX 2 pipeline example docstrings

1a4ae58

make style and make quality

b4d33df

Remove LTX 2 test scripts

724afee

yiyixuxu approved these changes Jan 8, 2026

View reviewed changes

dg845 and others added 4 commits January 8, 2026 04:51

Fix LTX 2 upsample pipeline example docstring

d24faa7

Add logic to convert and save a LTX 2 upsampling pipeline

353f0db

Merge branch 'main' into ltx-2-transformer

0c9e4e2

Document LTX2VideoTransformer3DModel forward pass

f85b969

dg845 merged commit c10bdd9 into main Jan 8, 2026
10 of 12 checks passed

This was referenced Jan 8, 2026

LTX-2 distilled checkpoint support #12925

Closed

LTX-2 condition pipeline #12926

Open

dg845 deleted the ltx-2-transformer branch January 8, 2026 22:01

sayakpaul added the roadmap Add to current release roadmap label Jan 14, 2026

github-project-automation bot added this to Diffusers Roadmap 0.37 Jan 14, 2026

github-project-automation bot moved this to Done in Diffusers Roadmap 0.37 Jan 14, 2026

david6666666 mentioned this pull request Jan 27, 2026

[Model] support Ltx2 text-to-video image-to-video vllm-project/vllm-omni#841

Open

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LTX 2.0 Video Pipelines#12915

Add LTX 2.0 Video Pipelines#12915
dg845 merged 103 commits intomainfrom
ltx-2-transformer

dg845 commented Jan 6, 2026 •

edited

Loading

Uh oh!

yiyixuxu left a comment

Uh oh!

dg845 commented Jan 8, 2026

Uh oh!

Uh oh!

hannalaguilar commented Jan 8, 2026 •

edited

Loading

What does this PR do?

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dg845 commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Jan 8, 2026

Uh oh!

Uh oh!

hannalaguilar commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dg845 commented Jan 6, 2026 •

edited

Loading

hannalaguilar commented Jan 8, 2026 •

edited

Loading