Error "NotImplementedError: Cannot copy out of meta tensor; no data!" on some tests #2500

antoche · 2023-02-26T22:05:24Z

Describe the bug

I am hitting this exceptions on various tests:

self = <tests.pipelines.unclip.test_unclip.UnCLIPPipelineFastTests testMethod=test_cpu_offload_forward_pass>

    @unittest.skipIf(
        torch_device != "cuda" or not is_accelerate_available(),
        reason="CPU offload is only available with CUDA and `accelerate` installed",
    )
    def test_cpu_offload_forward_pass(self):
        if not self.test_cpu_offload:
            return
    
        components = self.get_dummy_components()
        pipe = self.pipeline_class(**components)
        pipe.to(torch_device)
        pipe.set_progress_bar_config(disable=None)
    
        inputs = self.get_dummy_inputs(torch_device)
        output_without_offload = pipe(**inputs)[0]
    
        pipe.enable_sequential_cpu_offload()
        inputs = self.get_dummy_inputs(torch_device)
>       output_with_offload = pipe(**inputs)[0]

tests/test_pipelines_common.py:494: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/vol/apps/python/3.9/ext_modules/pytorch/1.12.1/cuda/11.4/torch/autograd/grad_mode.py:27: in decorate_context
    return func(*args, **kwargs)
src/diffusers/pipelines/unclip/pipeline_unclip.py:397: in __call__
    text_encoder_hidden_states, additive_clip_time_embeddings = self.text_proj(
/vol/apps/python/3.9/ext_modules/pytorch/1.12.1/cuda/11.4/torch/nn/modules/module.py:1130: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:148: in new_forward
    output = old_forward(*args, **kwargs)
src/diffusers/pipelines/unclip/text_proj.py:73: in forward
    time_projected_image_embeddings = self.clip_image_embeddings_project_to_time_embeddings(image_embeddings)
/vol/apps/python/3.9/ext_modules/pytorch/1.12.1/cuda/11.4/torch/nn/modules/module.py:1130: in _call_impl
    return forward_call(*input, **kwargs)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:143: in new_forward
    args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/hooks.py:252: in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(kwargs, self.execution_device)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:126: in send_to_device
    return recursively_apply(_send_to_device, tensor, device, test_type=_has_to_method)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:78: in recursively_apply
    return honor_type(
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:49: in honor_type
    return type(obj)(generator)
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:81: in <genexpr>
    recursively_apply(
/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:97: in recursively_apply
    return func(data, *args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

t = tensor(..., device='meta', size=(2, 32)), device = device(type='cuda', index=0)

    def _send_to_device(t, device):
>       return t.to(device)
E       NotImplementedError: Cannot copy out of meta tensor; no data!

/vol/apps/python/3.9/ext_modules/pyaccelerate/0.13.1/accelerate/utils/operations.py:121: NotImplementedError```

Specifically, on:
- tests/pipelines/paint_by_example/test_paint_by_example.py::PaintByExamplePipelineFastTests::test_cpu_offload_forward_pass
- tests/pipelines/unclip/test_unclip.py::UnCLIPPipelineFastTests::test_cpu_offload_forward_pass
- tests/pipelines/unclip/test_unclip_image_variation.py::UnCLIPImageVariationPipelineFastTests::test_cpu_offload_forward_pass


### Reproduction

Simply running the tests on the repo.

### Logs

```shell
See above

System Info

Tried on multiple linux machines with various Nvidia GPUs.

Running from branch v0.13.1

diffusers version: 0.13.1
Platform: Linux-4.14.240-weta-20210804-x86_64-with-glibc2.27
Python version: 3.9.10
PyTorch version (GPU?): 1.12.0a0+git664058f (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.26.0
Accelerate version: 0.13.1
xFormers version: 0.0.14.dev

The text was updated successfully, but these errors were encountered:

pcuenca · 2023-02-27T19:37:53Z

I could reproduce this with accelerate version 0.13.1. I'm currently not aware of a limitation of accelerate regarding CPU offloading, but if that's the case we should verify the minimum version before use.

@patrickvonplaten @muellerzr do you happen to have any insight here?

muellerzr · 2023-02-28T14:17:00Z

cc @sgugger

sgugger · 2023-02-28T14:46:08Z

There are no limitations per se, but bugs were fixed since 0.13.1. Is this an issue that is still ongoing?

pcuenca · 2023-02-28T17:07:00Z

There are no limitations per se, but bugs were fixed since 0.13.1. Is this an issue that is still ongoing?

No, it was fixed soon enough in 0.14.0. This is the first report we have received, and the OP found it while running tests. Occurrence was rare because it required offloading, an old version of accelerate and the safety checker had to be enabled.

gabgiani · 2023-04-23T07:27:17Z

Can someone who have the training process running confirm the versions used for transformers, torch , accelerate , to prevent this error about cannot copy out of meta tensor please?

thanks.

lianming03 · 2023-04-28T03:08:30Z

Hello, it is possible that the error was caused by transformers==4.26.0. Please try installing version 4.28.0 and check if it resolves the issue.

pcdilley · 2024-04-15T19:46:30Z

I'm getting this error with accelerate==0.29.2 and transformers==4.39.3.

antoche added the bug Something isn't working label Feb 26, 2023

pcuenca mentioned this issue Feb 28, 2023

Sequential cpu offload: require accelerate 0.14.0 #2517

Merged

pcuenca closed this as completed in #2517 Feb 28, 2023

d1g1t mentioned this issue Mar 2, 2023

Using lpw_stable_diffusion with sequential_cpu_offload gives 'NotImplementedError: Cannot copy out of meta tensor; no data!' #2531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error "NotImplementedError: Cannot copy out of meta tensor; no data!" on some tests #2500

Error "NotImplementedError: Cannot copy out of meta tensor; no data!" on some tests #2500

antoche commented Feb 26, 2023

pcuenca commented Feb 27, 2023

muellerzr commented Feb 28, 2023

sgugger commented Feb 28, 2023

pcuenca commented Feb 28, 2023

gabgiani commented Apr 23, 2023

lianming03 commented Apr 28, 2023

pcdilley commented Apr 15, 2024

Error "NotImplementedError: Cannot copy out of meta tensor; no data!" on some tests #2500

Error "NotImplementedError: Cannot copy out of meta tensor; no data!" on some tests #2500

Comments

antoche commented Feb 26, 2023

Describe the bug

System Info

pcuenca commented Feb 27, 2023

muellerzr commented Feb 28, 2023

sgugger commented Feb 28, 2023

pcuenca commented Feb 28, 2023

gabgiani commented Apr 23, 2023

lianming03 commented Apr 28, 2023

pcdilley commented Apr 15, 2024