You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -175,6 +179,28 @@ Here are some sample outputs:
175
179
</tr>
176
180
</table>
177
181
182
+
### Memory optimizations
183
+
184
+
Text-guided video generation with [`~TextToVideoSDPipeline`] and [`~VideoToVideoSDPipeline`] is very memory intensive both
185
+
when denoising with [`~UNet3DConditionModel`] and when decoding with [`~AutoencoderKL`]. It is possible though to reduce
186
+
memory usage at the cost of increased runtime to achieve the exact same result. To do so, it is recommended to enable
187
+
**forward chunking** and **vae slicing**:
188
+
189
+
Forward chunking via [`~UNet3DConditionModel.enable_forward_chunking`]is explained in [this blog post](https://huggingface.co/blog/reformer#2-chunked-feed-forward-layers) and
190
+
allows to significantly reduce the required memory for the unet. You can chunk the feed forward layer over the `num_frames`
0 commit comments