Skip to content

Add QKV fusion to the Hunyuan Video transformer #10407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Ednaordinary
Copy link
Contributor

What does this PR do?

This adds QKV fusion to Hunyuan Video. At the moment, this gives minimal/no improvement:

QKV No QKV
Time (sec) 522.18 547.21
VRAM (GiB) 4.17 3.88

The biggest improvement is expected in combination with torchao, though that currently errors out due to torchao tensors not having the ability to be concatenated:

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.cat', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'list'>,), kwarg_types={}

BitsAndBytes also errors out (relevant but somewhat dated discussion):

RuntimeError: Only Tensors of floating point and complex dtype can require gradients

There's a slight hack in HunyuanVideoIndividualTokenRefinerBlock, since attention with qkv fusion seems to become tuple(tensor, None) instead of just a tensor

Reproducible script

import torch
from diffusers import HunyuanVideoPipeline, HunyuanVideoTransformer3DModel
import imageio as iio
import math
import numpy as np
import io
import time

quant_div = 3
quant_mod = 3
full_dtype = torch.bfloat16
cast_dtype = torch.float8_e4m3fn

torch.manual_seed(42)

def export_to_video_bytes(fps, frames):
    request = iio.core.Request("<bytes>", mode="w", extension=".mp4")
    pyavobject = iio.plugins.pyav.PyAVPlugin(request)
    if isinstance(frames, np.ndarray):
        frames = (np.array(frames) * 255).astype('uint8')
    else:
        frames = np.array(frames)
    new_bytes = pyavobject.write(frames, codec="libx264", fps=fps)
    out_bytes = io.BytesIO(new_bytes)
    return out_bytes

def export_to_video(frames, path, fps):
    video_bytes = export_to_video_bytes(fps, frames)
    video_bytes.seek(0)
    with open(path, "wb") as f:
        f.write(video_bytes.getbuffer())

model_id = "tencent/HunyuanVideo"

print("Loading transformer")
transformer = HunyuanVideoTransformer3DModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16, revision="refs/pr/18"
)
transformer.fuse_qkv_projections()

pipe = HunyuanVideoPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.float16, revision="refs/pr/18")
pipe.scheduler._shift = 7.0
pipe.vae.enable_tiling()
#pipe.enable_model_cpu_offload()
pipe.enable_sequential_cpu_offload()

start_time = time.perf_counter()
output = pipe(
    prompt="a cat walks along the sidewalk of a city. The camera follows the cat at knee level. The city has many people and cars moving around, with advertisement billboards in the background",
    #height=544, #544,
    #width=960, #960,
    height = 544,
    width=960,
    num_frames=45,
    num_inference_steps=20,
).frames[0]
export_to_video(output, "output.mp4", fps=15)
print("Time:", round(time.perf_counter() - start_time, 2), "seconds")
print("Max vram:", round(torch.cuda.max_memory_allocated(device="cuda") / 1024 ** 3, 3), "GiB")

Comparison

QKV fusion:

output_qkv.mp4

No fusion:

output.mp4

Results are different but comparable.

Before submitting

Who can review?

@a-r-r-o-w @DN6

@a-r-r-o-w a-r-r-o-w self-requested a review December 30, 2024 11:38
Copy link
Member

@a-r-r-o-w a-r-r-o-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, QKV fusion does not really help much with eiter time or memory requirements, even with quantization. In fact, there is even slow downs at times depending on the quantization technique applied.

Not sure if it would be beneficial adding but since we do support it for some other things, it makes sense to do so in the interest of consistency. Will ask @yiyixuxu to make the final call

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 31, 2025
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Jan 31, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Feb 25, 2025
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Mar 4, 2025
@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Mar 4, 2025

@Ednaordinary thanks for the PR!
I think we should not add it here since it does not shows an improvement
sorry if we wasted your time 😟

@Ednaordinary
Copy link
Contributor Author

that's fair enough! I'll revisit to see if it shows an improvement when/if bnb or torchao gain concatenation support (or another quant system with support for those is added)

@Ednaordinary
Copy link
Contributor Author

(reopening/making a new pr if something changes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants