KVAE 1.0: Video and Image tokenizers

Shows an illustrated sun in light mode and a moon with stars in dark mode.

Habr | Project Page | Technical Report (soon) | 🤗 KVAE-3D / KVAE-2D

KVAE 1.0: Video and Image tokenizers

In this repository, we provide tokenizers for image and video diffusion models: KVAE-2D and KVAE-3D.

KVAE-2D model has compression 8x8 and 16 latent channels.

KVAE-3D model has time compression 4, spacial compression 8x8 and 16 latent channels

Evaluation results

KVAE-2D

Reconstructions comparison of KVAE-2D and Flux:

Evaluation results of KVAE-2D model on Imagenet-256 (valid) and DIV2K (valid, high-resolution). All compared models perform 8x8 compression with 16 latent channels:

Dataset	Model	PSNR	SSIM	LPIPS	rFID
ImageNet (256, val)	Wan-2.1	29.03	0.85	0.069	0.62
ImageNet (256, val)	Flux	31.11	0.91	0.041	0.11
ImageNet (256, val)	KVAE 2D	31.71	0.91	0.054	0.46
DIV2K	Wan-2.1	31.87	0.89	0.069	-
DIV2K	Flux	32.64	0.91	0.061	-
DIV2K	KVAE 2D	33.67	0.92	0.060	-

DiT training metrics comparison (blue — DiT+Flux, green and red — two versions of DiT+KVAE-2D):

KVAE-3D

Reconstructions comparison of KVAE-3D and Hunyuan:

Evaluation results of KVAE-3D model on MCL-JCV dataset. All compared models perform 4x8x8 compression with 16 latent channels:

Model	PSNR	SSIM	LPIPS
Wan-2.1	33.75	0.90	0.089
HunyuanVideo	33.91	0.91	0.103
KVAE-3D	35.63	0.92	0.088

Inference examples

Setup

Install requirements:

pip install -r requirements.txt

KVAE-2D inference

from kvae_2d.model import KVAE2D

model = KVAE2D.from_pretrained("kandinskylab/KVAE-2D-1.0").eval()
latent = model.encode(image)['y_hat']
rec = model.decode(latent)

More detailed example is presented in inference_2d.ipynb

KVAE-3D inference

For simple test, go to kvae_3d folder and run

python inference.py --frames 999

It will save reconstructions to output folder at repository root.

To use optimized compiled encoder version, run (max duration 257 frames):

python inference.py --frames 257 --optim

Citation

@misc{kvae_v1_2025,
    author = {Kirill Chernyshev, Andrey Shutkin, Ilia Vasiliev,
              Denis Parkhomenko, Ivan Kirillov,
              Dmitrii Mikhailov, Denis Dimitrov},
    title = {KVAE 1.0: 2D and 3D tokenizers for Image & Video generation models},
    howpublished = {\url{https://github.com/kandinskylab/kvae-1}},
    year = 2025
}

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
kvae_2d		kvae_2d
kvae_3d		kvae_3d
LICENSE.txt		LICENSE.txt
README.md		README.md
inference_2d.ipynb		inference_2d.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KVAE 1.0: Video and Image tokenizers

Evaluation results

KVAE-2D

KVAE-3D

Inference examples

Setup

KVAE-2D inference

KVAE-3D inference

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

kandinskylab/kvae-1

Folders and files

Latest commit

History

Repository files navigation

KVAE 1.0: Video and Image tokenizers

Evaluation results

KVAE-2D

KVAE-3D

Inference examples

Setup

KVAE-2D inference

KVAE-3D inference

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages