@@ -37,9 +37,12 @@ Resources:
37
37
| Pipeline | Tasks | Demo
38
38
| ---| ---| :---:|
39
39
| [ TextToVideoSDPipeline] ( https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth.py ) | * Text-to-Video Generation* | [ 🤗 Spaces] ( https://huggingface.co/spaces/damo-vilab/modelscope-text-to-video-synthesis )
40
+ | [ VideoToVideoSDPipeline] ( https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/text_to_video_synthesis/pipeline_text_to_video_synth_img2img.py ) | * Text-Guided Video-to-Video Generation* | [ (TODO)🤗 Spaces] ( )
40
41
41
42
## Usage example
42
43
44
+ ### ` text-to-video-ms-1.7b `
45
+
43
46
Let's start by generating a short video with the default length of 16 frames (2s at 8 fps):
44
47
45
48
``` python
@@ -119,12 +122,72 @@ Here are some sample outputs:
119
122
</tr >
120
123
</table >
121
124
125
+ ### ` cerspense/zeroscope_v2_576w ` & ` cerspense/zeroscope_v2_XL `
126
+
127
+ Zeroscope are watermark-free model and have been trained on specific sizes such as ` 576x320 ` and ` 1024x576 ` .
128
+ One should first generate a video using the lower resolution checkpoint [ ` cerspense/zeroscope_v2_576w ` ] ( https://huggingface.co/cerspense/zeroscope_v2_576w ) with [ ` TextToVideoSDPipeline ` ] ,
129
+ which can then be upscaled using [ ` VideoToVideoSDPipeline ` ] and [ ` cerspense/zeroscope_v2_XL ` ] ( https://huggingface.co/cerspense/zeroscope_v2_XL ) .
130
+
131
+
132
+ ``` py
133
+ import torch
134
+ from diffusers import DiffusionPipeline
135
+ from diffusers.utils import export_to_video
136
+
137
+ pipe = DiffusionPipeline.from_pretrained(" cerspense/zeroscope_v2_576w" , torch_dtype = torch.float16)
138
+ pipe.enable_model_cpu_offload()
139
+
140
+ # memory optimization
141
+ pipe.enable_vae_slicing()
142
+
143
+ prompt = " Darth Vader surfing a wave"
144
+ video_frames = pipe(prompt, num_frames = 24 ).frames
145
+ video_path = export_to_video(video_frames)
146
+ video_path
147
+ ```
148
+
149
+ Now the video can be upscaled:
150
+
151
+ ``` py
152
+ pipe = DiffusionPipeline.from_pretrained(" cerspense/zeroscope_v2_XL" , torch_dtype = torch.float16)
153
+ pipe.vae.enable_slicing()
154
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
155
+ pipe.enable_model_cpu_offload()
156
+
157
+ video = [Image.fromarray(frame).resize((1024 , 576 )) for frame in video_frames]
158
+
159
+ video_frames = pipe(prompt, video = video, strength = 0.6 ).frames
160
+ video_path = export_to_video(video_frames)
161
+ video_path
162
+ ```
163
+
164
+ Here are some sample outputs:
165
+
166
+ <table >
167
+ <tr >
168
+ <td ><center >
169
+ Darth vader surfing in waves.
170
+ <br >
171
+ <img src = " https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/darthvader_cerpense.gif"
172
+ alt = " Darth vader surfing in waves."
173
+ style = " width: 576px;" />
174
+ </center ></td >
175
+ </tr >
176
+ </table >
177
+
122
178
## Available checkpoints
123
179
124
180
* [ damo-vilab/text-to-video-ms-1.7b] ( https://huggingface.co/damo-vilab/text-to-video-ms-1.7b/ )
125
181
* [ damo-vilab/text-to-video-ms-1.7b-legacy] ( https://huggingface.co/damo-vilab/text-to-video-ms-1.7b-legacy )
182
+ * [ cerspense/zeroscope_v2_576w] ( https://huggingface.co/cerspense/zeroscope_v2_576w )
183
+ * [ cerspense/zeroscope_v2_XL] ( https://huggingface.co/cerspense/zeroscope_v2_XL )
126
184
127
185
## TextToVideoSDPipeline
128
186
[[ autodoc]] TextToVideoSDPipeline
129
187
- all
130
188
- __ call__
189
+
190
+ ## VideoToVideoSDPipeline
191
+ [[ autodoc]] VideoToVideoSDPipeline
192
+ - all
193
+ - __ call__
0 commit comments