11# 474_Gaze-LLE-DINOv3
22
3+
34[ ![ DOI] ( https://zenodo.org/badge/DOI/10.5281/zenodo.17413165.svg )] ( https://doi.org/10.5281/zenodo.17413165 ) ![ GitHub License] ( https://img.shields.io/github/license/pinto0309/gazelle-dinov3 )
45
56
67> [ !Note]
8+ > ** October 26, 2025 :** A checkpoint file ` Atto ` , ` Femto ` , ` Pico ` , ` N ` containing ` GazeFollow ` , ` VideoAttentionTarget ` trained weights and statistical information has been released.
9+ >
710> ** October 23, 2025 :** A checkpoint file ` .pt ` containing ` VideoAttentionTarget ` 's trained weights and statistical information has been released.
811>
912> ** October 22, 2025 :** A checkpoint file ` .pt ` containing ` GazeFollow ` 's trained weights and statistical information has been released.
@@ -444,6 +447,98 @@ is set to a positive value, and a teacher network is constructed with a separate
444447get_gazelle_model call.
445448
446449```
450+ ############################################# Atto
451+ ### distillation - GH200
452+ uv run python scripts/train_vat.py \
453+ --data_path data/videoattentiontarget \
454+ --model_name gazelle_hgnetv2_atto_inout \
455+ --exp_name gazelle_hgnetv2_atto_inout_distill \
456+ --init_ckpt ckpts/gazelle_hgnetv2_atto_distill.pt \
457+ --frame_sample_every 6 \
458+ --log_iter 50 \
459+ --max_epochs 65 \
460+ --batch_size 128 \
461+ --n_workers 60 \
462+ --lr_non_inout 1e-5 \
463+ --lr_inout 1e-2 \
464+ --inout_loss_lambda 1.0 \
465+ --use_amp \
466+ --grad_clip_norm 1.0 \
467+ --disable_sigmoid \
468+ --disable_progressive_unfreeze \
469+ --distill_teacher gazelle_dinov3_vitb16_inout \
470+ --distill_weight 0.3 \
471+ --distill_temp_end 4.0
472+
473+ ############################################# Femto
474+ ### distillation - GH200
475+ uv run python scripts/train_vat.py \
476+ --data_path data/videoattentiontarget \
477+ --model_name gazelle_hgnetv2_femto_inout \
478+ --exp_name gazelle_hgnetv2_femto_inout_distill \
479+ --init_ckpt ckpts/gazelle_hgnetv2_femto_distill.pt \
480+ --frame_sample_every 6 \
481+ --log_iter 50 \
482+ --max_epochs 60 \
483+ --batch_size 128 \
484+ --n_workers 60 \
485+ --lr_non_inout 1e-5 \
486+ --lr_inout 1e-2 \
487+ --inout_loss_lambda 1.0 \
488+ --use_amp \
489+ --grad_clip_norm 1.0 \
490+ --disable_sigmoid \
491+ --disable_progressive_unfreeze \
492+ --distill_teacher gazelle_dinov3_vitb16_inout \
493+ --distill_weight 0.3 \
494+ --distill_temp_end 4.0
495+
496+ ############################################# Pico
497+ ### distillation - GH200
498+ uv run python scripts/train_vat.py \
499+ --data_path data/videoattentiontarget \
500+ --model_name gazelle_hgnetv2_pico_inout \
501+ --exp_name gazelle_hgnetv2_pico_inout_distill \
502+ --init_ckpt ckpts/gazelle_hgnetv2_pico_distill.pt \
503+ --frame_sample_every 6 \
504+ --log_iter 50 \
505+ --max_epochs 50 \
506+ --batch_size 128 \
507+ --n_workers 60 \
508+ --lr_non_inout 1e-5 \
509+ --lr_inout 1e-2 \
510+ --inout_loss_lambda 1.0 \
511+ --use_amp \
512+ --grad_clip_norm 1.0 \
513+ --disable_sigmoid \
514+ --disable_progressive_unfreeze \
515+ --distill_teacher gazelle_dinov3_vitb16_inout \
516+ --distill_weight 0.3 \
517+ --distill_temp_end 4.0
518+
519+ ############################################# N
520+ ### distillation - GH200
521+ uv run python scripts/train_vat.py \
522+ --data_path data/videoattentiontarget \
523+ --model_name gazelle_hgnetv2_n_inout \
524+ --exp_name gazelle_hgnetv2_n_inout_distill \
525+ --init_ckpt ckpts/gazelle_hgnetv2_n_distill.pt \
526+ --frame_sample_every 6 \
527+ --log_iter 50 \
528+ --max_epochs 50 \
529+ --batch_size 128 \
530+ --n_workers 60 \
531+ --lr_non_inout 1e-5 \
532+ --lr_inout 1e-2 \
533+ --inout_loss_lambda 1.0 \
534+ --use_amp \
535+ --grad_clip_norm 1.0 \
536+ --disable_sigmoid \
537+ --disable_progressive_unfreeze \
538+ --distill_teacher gazelle_dinov3_vitb16_inout \
539+ --distill_weight 0.3 \
540+ --distill_temp_end 4.0
541+
447542############################################# S
448543### distillation - GH200
449544uv run python scripts/train_vat.py \
@@ -595,10 +690,10 @@ High accuracy is not important to me at all. I'm only interested in whether the
595690 | :-:| :-:| -:| -:| -:| :-:| :-:|
596691 | [ Gaze-LLE (ViT-B)] ( https://arxiv.org/pdf/2412.09586 ) | 88.80 M| 0.9560| 0.0450| 0.1040| [ Download] ( https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitb14.pt ) | ---|
597692 | [ Gaze-LLE (ViT-L)] ( https://arxiv.org/pdf/2412.09586 ) | 302.90 M| 0.9580| 0.0410| 0.0990| [ Download] ( https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitl14.pt ) | ---|
598- | Atto-distillation| 2.93 M|||| Download| Download|
599- | Femto-distillation| 3.15 M|||| Download| Download|
600- | Pico-distillation| 3.51 M|||| Download| Download|
601- | N-distillation| 4.61 M|||| Download| Download|
693+ | Atto-distillation| 2.93 M| 0.9267 | 0.0826 | 0.1482 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_distill_1x3x320x320_1xNx4.onnx ) |
694+ | Femto-distillation| 3.15 M| 0.9391 | 0.0656 | 0.1289 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_distill_1x3x416x416_1xNx4.onnx ) |
695+ | Pico-distillation| 3.51 M| 0.9491 | 0.0544 | 0.1149 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_distill_1x3x640x640_1xNx4.onnx ) |
696+ | N-distillation| 4.61 M| 0.9481 | 0.0549 | 0.1158 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_distill_1x3x640x640_1xNx4.onnx ) |
602697 | S-distillation| 8.17 M| 0.9545| 0.0484| 0.1118| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_1x3x640x640_1xNx4.onnx ) |
603698 | M-distillation| 12.37 M| 0.9564| 0.0462| 0.1042| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_1x3x640x640_1xNx4.onnx ) |
604699 | L-distillation| 24.33 M| 0.9593| 0.0418| 0.0992| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_1x3x640x640_1xNx4.onnx ) |
@@ -617,17 +712,26 @@ High accuracy is not important to me at all. I'm only interested in whether the
617712 | :-:| :-:|
618713 | <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_dinov3_vits16_1x3x640x640_1xNx4 " src =" https://github.com/user-attachments/assets/c51e3c81-65ba-4216-8907-087d505eeaea " />| <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_dinov3_vits16plus_1x3x640x640_1xNx4 " src =" https://github.com/user-attachments/assets/e59b053f-10e8-4b59-abe7-76b8858fc14f " />|
619714
715+ <img width =" 700 " alt =" benchmark_times_combined_2 " src =" https://github.com/user-attachments/assets/cb876564-f776-43c4-9547-6c2de220c2e1 " />
716+
717+ | N| Pico|
718+ | :-:| :-:|
719+ | <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_hgnetv2_n_distill_1x3x640x640_1xNx4 " src =" https://github.com/user-attachments/assets/cbef40a6-937f-4213-89b4-6403d9dd4b27 " />| <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_hgnetv2_pico_distill_1x3x640x640_1xNx4 " src =" https://github.com/user-attachments/assets/f5ddf1e5-25b2-4589-9cdb-727a59120620 " />|
720+
721+ | Femto| Atto|
722+ | :-:| :-:|
723+ | <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_hgnetv2_femto_distill_1x3x416x416_1xNx4 " src =" https://github.com/user-attachments/assets/233239dc-c35f-4285-bfed-f02a51fe511c " />| <img width =" 1280 " height =" 800 " alt =" benchmark_times_gazelle_hgnetv2_atto_distill_1x3x320x320_1xNx4 " src =" https://github.com/user-attachments/assets/137a961b-6027-4ddc-88c8-25f8b74c55fa " />|
620724
621725- VideoAttentionTarget
622726
623727 | Variant| Param<br >(Backbone+Head)| AUC ⬆️| Avg L2 ⬇️| AP IN/OUT ⬆️| Weight| ONNX|
624728 | :-:| :-:| -:| -:| -:| :-:| :-:|
625729 | [ Gaze-LLE (ViT-B)] ( https://arxiv.org/pdf/2412.09586 ) | 88.80 M| 0.9330| 0.1070| 0.8970| [ Download] ( https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitb14_inout.pt ) | ---|
626730 | [ Gaze-LLE (ViT-L)] ( https://arxiv.org/pdf/2412.09586 ) | 302.90 M| 0.9370| 0.1030| 0.9030| [ Download] ( https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitl14_inout.pt ) | ---|
627- | Atto-distillation| 2.93 M|||| Download| Download|
628- | Femto-distillation| 3.15 M|||| Download| Download|
629- | Pico-distillation| 3.51 M|||| Download| Download|
630- | N-distillation| 4.61 M|||| Download| Download|
731+ | Atto-distillation| 2.93 M| 0.9055 | 0.1523 | 0.8749 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_inout_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_inout_distill_1x3x320x320_1xNx4.onnx ) |
732+ | Femto-distillation| 3.15 M| 0.9166 | 0.1372 | 0.8779 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_inout_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_inout_distill_1x3x416x416_1xNx4.onnx ) |
733+ | Pico-distillation| 3.51 M| 0.9247 | 0.1245 | 0.8861 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_inout_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_inout_distill_1x3x640x640_1xNx4.onnx ) |
734+ | N-distillation| 4.61 M| 0.9218 | 0.1258 | 0.9012 | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_inout_distill.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_inout_distill_1x3x640x640_1xNx4.onnx ) |
631735 | S-distillation| 8.17 M| 0.9286| 0.1155| 0.8945| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_inout.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_inout_1x3x640x640_1xNx4.onnx ) |
632736 | M-distillation| 12.37 M| 0.9325| 0.1133| 0.8953| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_inout.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_inout_1x3x640x640_1xNx4.onnx ) |
633737 | L-distillation| 24.33 M| 0.9347| 0.1026| 0.9011| [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_inout.pt ) | [ Download] ( https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_inout_1x3x640x640_1xNx4.onnx ) |
0 commit comments