Add QKV fusion to the Hunyuan Video transformer #10407

Ednaordinary · 2024-12-29T14:01:54Z

What does this PR do?

This adds QKV fusion to Hunyuan Video. At the moment, this gives minimal/no improvement:

	QKV	No QKV
Time (sec)	522.18	547.21
VRAM (GiB)	4.17	3.88

The biggest improvement is expected in combination with torchao, though that currently errors out due to torchao tensors not having the ability to be concatenated:

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.cat', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'list'>,), kwarg_types={}

BitsAndBytes also errors out (relevant but somewhat dated discussion):

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

There's a slight hack in HunyuanVideoIndividualTokenRefinerBlock, since attention with qkv fusion seems to become tuple(tensor, None) instead of just a tensor

Reproducible script

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
import imageio as iio
import math
import numpy as np
import io
import time

quant_div = 3
quant_mod = 3
full_dtype = torch.bfloat16
cast_dtype = torch.float8_e4m3fn

torch.manual_seed(42)

def export_to_video_bytes(fps, frames):
    request = iio.core.Request("<bytes>", mode="w", extension=".mp4")
    pyavobject = iio.plugins.pyav.PyAVPlugin(request)
    if isinstance(frames, np.ndarray):
        frames = (np.array(frames) * 255).astype('uint8')
    else:
        frames = np.array(frames)
    new_bytes = pyavobject.write(frames, codec="libx264", fps=fps)
    out_bytes = io.BytesIO(new_bytes)
    return out_bytes

def export_to_video(frames, path, fps):
    video_bytes = export_to_video_bytes(fps, frames)
    video_bytes.seek(0)
    with open(path, "wb") as f:
        f.write(video_bytes.getbuffer())

model_id = "tencent/HunyuanVideo"

print("Loading transformer")
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16, revision="refs/pr/18"
)
transformer.fuse_qkv_projections()

pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18")
pipe.scheduler._shift = 7.0
pipe.vae.enable_tiling()
#pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

start_time = time.perf_counter()
output = pipe(
    prompt="a cat walks along the sidewalk of a city. The camera follows the cat at knee level. The city has many people and cars moving around, with advertisement billboards in the background",
    #height=544, #544,
    #width=960, #960,
    height = 544,
    width=960,
    num_frames=45,
    num_inference_steps=20,
).frames[0]
export_to_video(output, "output.mp4", fps=15)
print("Time:", round(time.perf_counter() - start_time, 2), "seconds")
print("Max vram:", round(torch.cuda.max_memory_allocated(device="cuda") / 1024 ** 3, 3), "GiB")

Comparison

QKV fusion:

output_qkv.mp4

No fusion:

output.mp4

Results are different but comparable.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@a-r-r-o-w @DN6

a-r-r-o-w

In my experience, QKV fusion does not really help much with eiter time or memory requirements, even with quantization. In fact, there is even slow downs at times depending on the quantization technique applied.

Not sure if it would be beneficial adding but since we do support it for some other things, it makes sense to do so in the interest of consistency. Will ask @yiyixuxu to make the final call

github-actions · 2025-01-31T15:02:48Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

HuggingFaceDocBuilderDev · 2025-01-31T18:39:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-02-25T15:03:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu · 2025-03-04T00:50:48Z

@Ednaordinary thanks for the PR!
I think we should not add it here since it does not shows an improvement
sorry if we wasted your time 😟

Ednaordinary · 2025-03-04T03:58:14Z

that's fair enough! I'll revisit to see if it shows an improvement when/if bnb or torchao gain concatenation support (or another quant system with support for those is added)

Ednaordinary · 2025-03-09T22:26:25Z

(reopening/making a new pr if something changes)

Update transformer_hunyuan_video.py

e559ae8

a-r-r-o-w self-requested a review December 30, 2024 11:38

Merge branch 'main' into hunyuan-qkv

5e4e83d

a-r-r-o-w reviewed Jan 6, 2025

View reviewed changes

github-actions bot added the stale Issues that haven't received updates label Jan 31, 2025

yiyixuxu removed the stale Issues that haven't received updates label Jan 31, 2025

github-actions bot added the stale Issues that haven't received updates label Feb 25, 2025

Merge branch 'main' into hunyuan-qkv

6bf08d6

yiyixuxu removed the stale Issues that haven't received updates label Mar 4, 2025

Ednaordinary closed this Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QKV fusion to the Hunyuan Video transformer #10407

Add QKV fusion to the Hunyuan Video transformer #10407

Ednaordinary commented Dec 29, 2024

a-r-r-o-w left a comment

github-actions bot commented Jan 31, 2025

HuggingFaceDocBuilderDev commented Jan 31, 2025

github-actions bot commented Feb 25, 2025

yiyixuxu commented Mar 4, 2025

Ednaordinary commented Mar 4, 2025

Ednaordinary commented Mar 9, 2025

Add QKV fusion to the Hunyuan Video transformer #10407

Add QKV fusion to the Hunyuan Video transformer #10407

Conversation

Ednaordinary commented Dec 29, 2024

What does this PR do?

Before submitting

Who can review?

a-r-r-o-w left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 31, 2025

HuggingFaceDocBuilderDev commented Jan 31, 2025

github-actions bot commented Feb 25, 2025

yiyixuxu commented Mar 4, 2025

Ednaordinary commented Mar 4, 2025

Ednaordinary commented Mar 9, 2025