Skip to content

kandinskylab/kvae-1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shows an illustrated sun in light mode and a moon with stars in dark mode.
Habr | Project Page | Technical Report (soon) | 🤗 KVAE-3D / KVAE-2D

KVAE 1.0: Video and Image tokenizers

In this repository, we provide tokenizers for image and video diffusion models: KVAE-2D and KVAE-3D.

KVAE-2D model has compression 8x8 and 16 latent channels.

KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels

Evaluation results

KVAE-2D

Reconstructions comparison of KVAE-2D and Flux:

Evaluation results of KVAE-2D model on Imagenet-256 (valid) and DIV2K (valid, high-resolution). All compared models perform 8x8 compression with 16 latent channels:

Dataset Model PSNR SSIM LPIPS rFID
ImageNet (256, val) Wan-2.1 29.03 0.85 0.069 0.62
ImageNet (256, val) Flux 31.11 0.91 0.041 0.11
ImageNet (256, val) KVAE 2D 31.71 0.91 0.054 0.46
DIV2K Wan-2.1 31.87 0.89 0.069 -
DIV2K Flux 32.64 0.91 0.061 -
DIV2K KVAE 2D 33.67 0.92 0.060 -

DiT training metrics comparison (blue — DiT+Flux, green and red — two versions of DiT+KVAE-2D):

KVAE-3D

Reconstructions comparison of KVAE-3D and Hunyuan:

Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:

Model PSNR SSIM LPIPS
Wan-2.1 33.75 0.90 0.089
HunyuanVideo 33.91 0.91 0.103
KVAE-3D 35.63 0.92 0.088

Inference examples

Setup

Install requirements:

pip install -r requirements.txt

KVAE-2D inference

from kvae_2d.model import KVAE2D

model = KVAE2D.from_pretrained("kandinskylab/KVAE-2D-1.0").eval()
latent = model.encode(image)['y_hat']
rec = model.decode(latent)

More detailed example is presented in inference_2d.ipynb

KVAE-3D inference

For simple test, go to kvae_3d folder and run

python inference.py --frames 999

It will save reconstructions to output folder at repository root.

To use optimized compiled encoder version, run (max duration 257 frames):

python inference.py --frames 257 --optim

Citation

@misc{kvae_v1_2025,
    author = {Kirill Chernyshev, Andrey Shutkin, Ilia Vasiliev,
              Denis Parkhomenko, Ivan Kirillov,
              Dmitrii Mikhailov, Denis Dimitrov},
    title = {KVAE 1.0: 2D and 3D tokenizers for Image & Video generation models},
    howpublished = {\url{https://github.com/kandinskylab/kvae-1}},
    year = 2025
}

About

KVAE 1.0

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •