Introduction

Preliminary steps

Lora Fine-tuning of CogVideoX Text-to-Video:

Run the commands in the terminal to launch training.
```
bash shscripts/train_cogvideox_t2v_lora.sh
```
After training, run the commands to inference your personalized models.
```
bash shscripts/inference_cogvideo_t2v_lora.sh
```
- You need to provide the checkpoint path to the ckpt argument in the above shell script.
Note:
- The training and inference use the default model config from configs/004_cogvideox/cogvideo5b.yaml

Lora Fine-tuning of CogVideoX Image-to-Video:

Run the commands in the terminal to launch training.
```
bash shscripts/train_cogvideox_i2v_lora.sh
```
After training, run the commands to inference your personalized models.
```
bash shscripts/inference_cogvideo_i2v_lora.sh
```
- You need to provide the checkpoint path to the ckpt argument in the above shell script.
Note:
- The training and inference use the default model config from configs/004_cogvideox/cogvideo5b-i2v.yaml

Full Fine-tuning of CogVideoX Text-to-Video:

Run the commands in the terminal to launch training.
```
bash shscripts/train_cogvideox_t2v_fullft.sh
```
We tested on 4 H800 GPUs. The training requires 68GB GPU memory.
After training, run the commands to inference your personalized models.
```
shscripts/inference_cogvideo_t2v_fullft.sh
```
- You need to provide the checkpoint path to the ckpt argument in the above shell script. Because the full fine-tuning uses deepspeed to reduce GPU memory, so the checkpoint is like ${exp_save_dir}/checkpoints/trainstep_checkpoints/epoch=xxxxxx-step=xxxxxxxxx.ckpt/checkpoint/mp_rank_00_model_states.pt
Note:
- The training and inference use the default model config from configs/004_cogvideox/cogvideo5b-i2v-fullft.yaml

Full Fine-tuning of CogVideoX Image-to-Video:

Same as above full fine-tuning of text-to-video.

bash shscripts/train_cogvideox_i2v_fullft.sh

shscripts/inference_cogvideo_i2v_fullft.sh