Skip to content

Using lpw_stable_diffusion with sequential_cpu_offload gives 'NotImplementedError: Cannot copy out of meta tensor; no data!' #2531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
d1g1t opened this issue Mar 2, 2023 · 11 comments
Labels
bug Something isn't working stale Issues that haven't received updates

Comments

@d1g1t
Copy link
Contributor

d1g1t commented Mar 2, 2023

Describe the bug

Trying to use the example code from https://huggingface.co/docs/diffusers/using-diffusers/custom_pipeline_examples
with cpu offloading as suggested at https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings
I have removed the pipe.to("cuda") as suggested.

Reproduction

#!/usr/bin/python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", custom_pipeline="lpw_stable_diffusion", torch_dtype=torch.float16
)
#pipe = pipe.to("cuda")

prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry"

pipe.enable_sequential_cpu_offload()

pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0].save('mem_test.jpg')

Logs

No response

System Info

  • huggingface_hub version: 0.12.1
  • Platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.35
  • Python version: 3.10.6
  • Running in iPython ?: No
  • Running in notebook ?: No
  • Running in Google Colab ?: No
  • Token path ?: /home/d1g1t/.cache/huggingface/token
  • Has saved token ?: True
  • Who am I ?: d1g1t
  • Configured git credential helpers:
  • FastAI: N/A
  • Tensorflow: N/A
  • Torch: 1.13.1
  • Jinja2: 3.0.3
  • Graphviz: N/A
  • Pydot: N/A
  • Pillow: 9.4.0
  • hf_transfer: N/A
  • ENDPOINT: https://huggingface.co
  • HUGGINGFACE_HUB_CACHE: /home/d1g1t/.cache/huggingface/hub
  • HUGGINGFACE_ASSETS_CACHE: /home/d1g1t/.cache/huggingface/assets
  • HF_HUB_OFFLINE: False
  • HF_TOKEN_PATH: /home/d1g1t/.cache/huggingface/token
  • HF_HUB_DISABLE_PROGRESS_BARS: None
  • HF_HUB_DISABLE_SYMLINKS_WARNING: False
  • HF_HUB_DISABLE_IMPLICIT_TOKEN: False
  • HF_HUB_ENABLE_HF_TRANSFER: False
@d1g1t d1g1t added the bug Something isn't working label Mar 2, 2023
@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 2, 2023

Is this because I'm using DiffusionPipeline and not StableDiffusionPipeline?
Is cpu offloading currently not possible with semantic/long prompts?
Appears to be supported a480229

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 2, 2023

$  pip show accelerate
Name: accelerate
Version: 0.16.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: [email protected]
License: Apache
Location: /home/d1g1t/.local/lib/python3.10/site-packages
Requires: numpy, packaging, psutil, pyyaml, torch
Required-by: k-diffusion

in case it is relevant due to #2500
but my huggingface-cli env does not show it for some reason

@patrickvonplaten
Copy link
Contributor

Hey @d1g1t,

pipe.enable_sequential_cpu_offload() is not (yet) implemented for the lpw pipeline. The method needs to be added directly to:
https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 3, 2023

Ah,
The Mega Pipeline throws
AttributeError: 'StableDiffusionMegaPipeline' object has no attribute 'enable_sequential_cpu_offload'
when attempting to enable cpu offloading, lpw didn't so I assumed it might be supported.

Looks like enable_sequential_cpu_offload was removed from the lpw pipeline in commit 4eb9ad0

@Skquark
Copy link

Skquark commented Mar 3, 2023

I too kept running into this error a couple months ago trying to enable cpu offload, but thought it was some other thing I was doing wrong, not just the lpw pipeline which has been my primary. I instead disabled the option for offloading everywhere and gave up enabling it. I hope it's easy to implement it into LPW Pipeline and any of the other pipes that are missing it, that'd make a difference for us... Every bit helps for the low vram...

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 3, 2023

I'm not fluent with the code base, but the errors seem to be happening due to a few tensors being created on the meta device in get_weighted_text_embeddings() which causes them to lose their data.

I've got both enable_sequential_cpu_offload() and enable_model_cpu_offload() working in a very hack-y way by creating those tensors on 'cpu' when unet is on 'meta', and moving a couple of others as needed.
(https://github.com/d1g1t/diffusers/blob/main/examples/community/lpw_stable_diffusion.py)
Hope this helps as a quick workaround before someone more competent can fix it the right way

On my 1060 card with 6GB VRAM, max_memory_reserved() is consistently 400MB~ lower with model offload and not a lot with sequential offload. I'm also getting tiny differences in the final image with sequential offload when using the same seed, so I may be doing something wrong.

@Skquark
Copy link

Skquark commented Mar 4, 2023

I applied your hacks to my lpw, plus the empty_cache() part that Patrick just added, and so far so good with sequential_cpu_offload enabled, not getting that meta tensor error anymore. Still testing, but the the memory was much lower. I'm wondering if it's also safe to add enable_vae_slicing and enable_vae_tiling into lpw since it's just calls self.vae function...

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 4, 2023

I suspect they don't need to be added because StableDiffusionLongPromptWeightingPipeline inherits from StableDiffusionPipeline. Looks like me copy pasting enable_sequential_cpu_offload() and enable_model_cpu_offload() was unnecessary and can be removed. Makes sense why it was removed in an earlier commit.

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 5, 2023

Looks like you also cannot pass a list of generators as shown in https://huggingface.co/docs/diffusers/using-diffusers/reusing_seeds

TypeError: randn() received an invalid combination of arguments - got (tuple, dtype=torch.dtype, device=torch.device, generator=list), but expected one of:
 * (tuple of ints size, *, torch.Generator generator, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, tuple of names names, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

The StableDiffusionPipeline calls randn_tensor() from utils to generate the latents while lpw calls torch.randn() directly which expects a generator object and not a list.

Previously, I had been passing latents as shown in This Collab (I think this used to be linked in the docs) until now and was getting inconsistent images with the same seed, but only when using Euler Ancestral and KDPM2 Ancestral Schedulers.
I don't have that issue once I started passing the generator with the manual seed instead of latents.

@d1g1t
Copy link
Contributor Author

d1g1t commented Mar 21, 2023

In case anyone ends up on this page, I ended up with a much simpler way to use lpw that I've posted at #2668 (comment) which is to simply generate the embeddings using a method in the lpw file instead of using it as a custom pipeline, and passing those to the pipeline instead.
Incidentally it's the same method that throws the NotImplementedError: Cannot copy out of meta tensor; no data! error, so the file may still need updating to create tensors on the correct device.

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

3 participants