Skip to content

Commit 4e89856

Browse files
revert automatic chunking (#3934)
* revert automatic chunking * Apply suggestions from code review * revert automatic chunking
1 parent 332d2bb commit 4e89856

File tree

3 files changed

+27
-7
lines changed

3 files changed

+27
-7
lines changed

docs/source/en/api/pipelines/text_to_video.mdx

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", torch_dt
138138
pipe.enable_model_cpu_offload()
139139

140140
# memory optimization
141+
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
141142
pipe.enable_vae_slicing()
142143

143144
prompt = "Darth Vader surfing a wave"
@@ -150,10 +151,13 @@ Now the video can be upscaled:
150151

151152
```py
152153
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_XL", torch_dtype=torch.float16)
153-
pipe.vae.enable_slicing()
154154
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
155155
pipe.enable_model_cpu_offload()
156156

157+
# memory optimization
158+
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
159+
pipe.enable_vae_slicing()
160+
157161
video = [Image.fromarray(frame).resize((1024, 576)) for frame in video_frames]
158162

159163
video_frames = pipe(prompt, video=video, strength=0.6).frames
@@ -175,6 +179,28 @@ Here are some sample outputs:
175179
</tr>
176180
</table>
177181

182+
### Memory optimizations
183+
184+
Text-guided video generation with [`~TextToVideoSDPipeline`] and [`~VideoToVideoSDPipeline`] is very memory intensive both
185+
when denoising with [`~UNet3DConditionModel`] and when decoding with [`~AutoencoderKL`]. It is possible though to reduce
186+
memory usage at the cost of increased runtime to achieve the exact same result. To do so, it is recommended to enable
187+
**forward chunking** and **vae slicing**:
188+
189+
Forward chunking via [`~UNet3DConditionModel.enable_forward_chunking`]is explained in [this blog post](https://huggingface.co/blog/reformer#2-chunked-feed-forward-layers) and
190+
allows to significantly reduce the required memory for the unet. You can chunk the feed forward layer over the `num_frames`
191+
dimension by doing:
192+
193+
```py
194+
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
195+
```
196+
197+
Vae slicing via [`~TextToVideoSDPipeline.enable_vae_slicing`] and [`~VideoToVideoSDPipeline.enable_vae_slicing`] also
198+
gives significant memory savings since the two pipelines decode all image frames at once.
199+
200+
```py
201+
pipe.enable_vae_slicing()
202+
```
203+
178204
## Available checkpoints
179205

180206
* [damo-vilab/text-to-video-ms-1.7b](https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/)

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -634,9 +634,6 @@ def __call__(
634634
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
635635
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
636636

637-
# 6.1 Chunk feed-forward computation to save memory
638-
self.unet.enable_forward_chunking(chunk_size=1, dim=1)
639-
640637
# 7. Denoising loop
641638
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
642639
with self.progress_bar(total=num_inference_steps) as progress_bar:

src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -709,9 +709,6 @@ def __call__(
709709
# 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
710710
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
711711

712-
# 6.1 Chunk feed-forward computation to save memory
713-
self.unet.enable_forward_chunking(chunk_size=1, dim=1)
714-
715712
# 7. Denoising loop
716713
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
717714
with self.progress_bar(total=num_inference_steps) as progress_bar:

0 commit comments

Comments
 (0)