Training example of controlNet yield error #3101

svjack · 2023-04-14T05:07:32Z

Describe the bug

I try training controlnet in my dataset "https://huggingface.co/datasets/svjack/diffusiondb_100_canny_zh"
with small gpu memory config as following

Reproduction

export MODEL_DIR="IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1"
export OUTPUT_DIR="TSD_save"

accelerate launch train_controlnet.py \
 --pretrained_model_name_or_path=$MODEL_DIR \
 --output_dir=$OUTPUT_DIR \
 --dataset_name=svjack/diffusiondb_100_canny_zh \
 --resolution=512 \
 --learning_rate=1e-5 \
 --train_batch_size=1 \
 --gradient_accumulation_steps=1 \
 --gradient_checkpointing \
 --use_8bit_adam \
 --tracker_project_name canny  \
 --set_grads_to_none \
 --conditioning_image_column guide \
 --caption_column zh_text \
 --mixed_precision fp16

Logs

Traceback (most recent call last):
  File "train_controlnet.py", line 1051, in <module>
    main(args)
  File "train_controlnet.py", line 970, in main
    return_dict=False,
  File "/environment/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/environment/miniconda3/lib/python3.7/site-packages/accelerate/utils/operations.py", line 495, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/environment/miniconda3/lib/python3.7/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/environment/miniconda3/lib/python3.7/site-packages/diffusers/models/controlnet.py", line 519, in forward
    sample += controlnet_cond
RuntimeError: The size of tensor a (85) must match the size of tensor b (86) at non-singleton dimension 2
Steps:  88%|██████████████████████████████████████████████████████████████████████████████████████████▋            | 88/100 [00:44<00:06,  2.00it/s, loss=0.013, lr=1e-5]
Traceback (most recent call last):
  File "/environment/miniconda3/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/environment/miniconda3/lib/python3.7/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/environment/miniconda3/lib/python3.7/site-packages/accelerate/commands/launch.py", line 923, in launch_command
    simple_launcher(args)
  File "/environment/miniconda3/lib/python3.7/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/environment/miniconda3/bin/python', 'train_controlnet.py', '--pretrained_model_name_or_path=Taiyi-Stable-Diffusion-1B-Chinese-v0.1', '--output_dir=TSD_save', '--dataset_name=svjack/diffusiondb_100_canny_zh', '--resolution=512', '--learning_rate=1e-5', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--tracker_project_name', 'canny', '--set_grads_to_none', '--mixed_precision', 'fp16']' returned non-zero exit status 1.

System Info

In the newest version of diffusers of python3.7 in A4000

sayakpaul · 2023-04-17T12:23:35Z

If the same training script runs with runwayml/stable-diffusion-v1-5 as the base model, I suspect the model you are providing to pretrained_model_name_or_path is causing this issue.

Ccing @williamberman and @yiyixuxu.

williamberman · 2023-04-17T19:33:10Z

I believe the default resizing code in the training script is not resizing to a multiple of 8 causing the encoded image to have different height/width dimensions than the encoded conditioning image (which uses a separate encoder that's part of the controlnet model)

svjack · 2023-04-18T00:05:07Z

I believe the default resizing code in the training script is not resizing to a multiple of 8 causing the encoded image to have different height/width dimensions than the encoded conditioning image (which uses a separate encoder that's part of the controlnet model)

i will check this

williamberman · 2023-04-18T20:51:03Z

@svjack I think you took your dataset off the hub so I can't test 😁

svjack · 2023-04-19T00:31:58Z

@svjack I think you took your dataset off the hub so I can't test 😁

I have tried the code you fork from main branch of diffusers
it failed
but i resize all my images to (512 512)
it works

williamberman · 2023-04-19T17:47:51Z

hey @svjack could you elaborate on what went wrong?

svjack · 2023-04-19T23:35:59Z

hey @svjack could you elaborate on what went wrong?

may be the image_transforms part as you say.

github-actions · 2023-05-14T15:02:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

svjack added the bug Something isn't working label Apr 14, 2023

williamberman self-assigned this Apr 17, 2023

williamberman mentioned this issue Apr 17, 2023

controlnet training resize inputs to multiple of 8 #3135

Merged

github-actions bot added the stale Issues that haven't received updates label May 14, 2023

github-actions bot closed this as completed May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training example of controlNet yield error #3101

Training example of controlNet yield error #3101

svjack commented Apr 14, 2023

sayakpaul commented Apr 17, 2023

williamberman commented Apr 17, 2023

svjack commented Apr 18, 2023

williamberman commented Apr 18, 2023

svjack commented Apr 19, 2023

williamberman commented Apr 19, 2023

svjack commented Apr 19, 2023

github-actions bot commented May 14, 2023

Training example of controlNet yield error #3101

Training example of controlNet yield error #3101

Comments

svjack commented Apr 14, 2023

Describe the bug

Reproduction

Logs

System Info

sayakpaul commented Apr 17, 2023

williamberman commented Apr 17, 2023

svjack commented Apr 18, 2023

williamberman commented Apr 18, 2023

svjack commented Apr 19, 2023

williamberman commented Apr 19, 2023

svjack commented Apr 19, 2023

github-actions bot commented May 14, 2023