Skip to content

Commit 562f47d

Browse files
authored
Update README.md
1 parent fe74d80 commit 562f47d

1 file changed

Lines changed: 112 additions & 8 deletions

File tree

474_Gaze-LLE-DINOv3/README.md

Lines changed: 112 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
# 474_Gaze-LLE-DINOv3
22

3+
34
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.17413165.svg)](https://doi.org/10.5281/zenodo.17413165) ![GitHub License](https://img.shields.io/github/license/pinto0309/gazelle-dinov3)
45

56

67
> [!Note]
8+
> **October 26, 2025 :** A checkpoint file `Atto`, `Femto`, `Pico`, `N` containing `GazeFollow`, `VideoAttentionTarget` trained weights and statistical information has been released.
9+
>
710
> **October 23, 2025 :** A checkpoint file `.pt` containing `VideoAttentionTarget`'s trained weights and statistical information has been released.
811
>
912
> **October 22, 2025 :** A checkpoint file `.pt` containing `GazeFollow`'s trained weights and statistical information has been released.
@@ -444,6 +447,98 @@ is set to a positive value, and a teacher network is constructed with a separate
444447
get_gazelle_model call.
445448

446449
```
450+
############################################# Atto
451+
### distillation - GH200
452+
uv run python scripts/train_vat.py \
453+
--data_path data/videoattentiontarget \
454+
--model_name gazelle_hgnetv2_atto_inout \
455+
--exp_name gazelle_hgnetv2_atto_inout_distill \
456+
--init_ckpt ckpts/gazelle_hgnetv2_atto_distill.pt \
457+
--frame_sample_every 6 \
458+
--log_iter 50 \
459+
--max_epochs 65 \
460+
--batch_size 128 \
461+
--n_workers 60 \
462+
--lr_non_inout 1e-5 \
463+
--lr_inout 1e-2 \
464+
--inout_loss_lambda 1.0 \
465+
--use_amp \
466+
--grad_clip_norm 1.0 \
467+
--disable_sigmoid \
468+
--disable_progressive_unfreeze \
469+
--distill_teacher gazelle_dinov3_vitb16_inout \
470+
--distill_weight 0.3 \
471+
--distill_temp_end 4.0
472+
473+
############################################# Femto
474+
### distillation - GH200
475+
uv run python scripts/train_vat.py \
476+
--data_path data/videoattentiontarget \
477+
--model_name gazelle_hgnetv2_femto_inout \
478+
--exp_name gazelle_hgnetv2_femto_inout_distill \
479+
--init_ckpt ckpts/gazelle_hgnetv2_femto_distill.pt \
480+
--frame_sample_every 6 \
481+
--log_iter 50 \
482+
--max_epochs 60 \
483+
--batch_size 128 \
484+
--n_workers 60 \
485+
--lr_non_inout 1e-5 \
486+
--lr_inout 1e-2 \
487+
--inout_loss_lambda 1.0 \
488+
--use_amp \
489+
--grad_clip_norm 1.0 \
490+
--disable_sigmoid \
491+
--disable_progressive_unfreeze \
492+
--distill_teacher gazelle_dinov3_vitb16_inout \
493+
--distill_weight 0.3 \
494+
--distill_temp_end 4.0
495+
496+
############################################# Pico
497+
### distillation - GH200
498+
uv run python scripts/train_vat.py \
499+
--data_path data/videoattentiontarget \
500+
--model_name gazelle_hgnetv2_pico_inout \
501+
--exp_name gazelle_hgnetv2_pico_inout_distill \
502+
--init_ckpt ckpts/gazelle_hgnetv2_pico_distill.pt \
503+
--frame_sample_every 6 \
504+
--log_iter 50 \
505+
--max_epochs 50 \
506+
--batch_size 128 \
507+
--n_workers 60 \
508+
--lr_non_inout 1e-5 \
509+
--lr_inout 1e-2 \
510+
--inout_loss_lambda 1.0 \
511+
--use_amp \
512+
--grad_clip_norm 1.0 \
513+
--disable_sigmoid \
514+
--disable_progressive_unfreeze \
515+
--distill_teacher gazelle_dinov3_vitb16_inout \
516+
--distill_weight 0.3 \
517+
--distill_temp_end 4.0
518+
519+
############################################# N
520+
### distillation - GH200
521+
uv run python scripts/train_vat.py \
522+
--data_path data/videoattentiontarget \
523+
--model_name gazelle_hgnetv2_n_inout \
524+
--exp_name gazelle_hgnetv2_n_inout_distill \
525+
--init_ckpt ckpts/gazelle_hgnetv2_n_distill.pt \
526+
--frame_sample_every 6 \
527+
--log_iter 50 \
528+
--max_epochs 50 \
529+
--batch_size 128 \
530+
--n_workers 60 \
531+
--lr_non_inout 1e-5 \
532+
--lr_inout 1e-2 \
533+
--inout_loss_lambda 1.0 \
534+
--use_amp \
535+
--grad_clip_norm 1.0 \
536+
--disable_sigmoid \
537+
--disable_progressive_unfreeze \
538+
--distill_teacher gazelle_dinov3_vitb16_inout \
539+
--distill_weight 0.3 \
540+
--distill_temp_end 4.0
541+
447542
############################################# S
448543
### distillation - GH200
449544
uv run python scripts/train_vat.py \
@@ -595,10 +690,10 @@ High accuracy is not important to me at all. I'm only interested in whether the
595690
|:-:|:-:|-:|-:|-:|:-:|:-:|
596691
|[Gaze-LLE (ViT-B)](https://arxiv.org/pdf/2412.09586)|88.80 M|0.9560|0.0450|0.1040|[Download](https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitb14.pt)|---|
597692
|[Gaze-LLE (ViT-L)](https://arxiv.org/pdf/2412.09586)|302.90 M|0.9580|0.0410|0.0990|[Download](https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitl14.pt)|---|
598-
|Atto-distillation|2.93 M||||Download|Download|
599-
|Femto-distillation|3.15 M||||Download|Download|
600-
|Pico-distillation|3.51 M||||Download|Download|
601-
|N-distillation|4.61 M||||Download|Download|
693+
|Atto-distillation|2.93 M|0.9267|0.0826|0.1482|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_distill_1x3x320x320_1xNx4.onnx)|
694+
|Femto-distillation|3.15 M|0.9391|0.0656|0.1289|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_distill_1x3x416x416_1xNx4.onnx)|
695+
|Pico-distillation|3.51 M|0.9491|0.0544|0.1149|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_distill_1x3x640x640_1xNx4.onnx)|
696+
|N-distillation|4.61 M|0.9481|0.0549|0.1158|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_distill_1x3x640x640_1xNx4.onnx)|
602697
|S-distillation|8.17 M|0.9545|0.0484|0.1118|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_1x3x640x640_1xNx4.onnx)|
603698
|M-distillation|12.37 M|0.9564|0.0462|0.1042|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_1x3x640x640_1xNx4.onnx)|
604699
|L-distillation|24.33 M|0.9593|0.0418|0.0992|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_1x3x640x640_1xNx4.onnx)|
@@ -617,17 +712,26 @@ High accuracy is not important to me at all. I'm only interested in whether the
617712
|:-:|:-:|
618713
|<img width="1280" height="800" alt="benchmark_times_gazelle_dinov3_vits16_1x3x640x640_1xNx4" src="https://github.com/user-attachments/assets/c51e3c81-65ba-4216-8907-087d505eeaea" />|<img width="1280" height="800" alt="benchmark_times_gazelle_dinov3_vits16plus_1x3x640x640_1xNx4" src="https://github.com/user-attachments/assets/e59b053f-10e8-4b59-abe7-76b8858fc14f" />|
619714

715+
<img width="700" alt="benchmark_times_combined_2" src="https://github.com/user-attachments/assets/cb876564-f776-43c4-9547-6c2de220c2e1" />
716+
717+
|N|Pico|
718+
|:-:|:-:|
719+
|<img width="1280" height="800" alt="benchmark_times_gazelle_hgnetv2_n_distill_1x3x640x640_1xNx4" src="https://github.com/user-attachments/assets/cbef40a6-937f-4213-89b4-6403d9dd4b27" />|<img width="1280" height="800" alt="benchmark_times_gazelle_hgnetv2_pico_distill_1x3x640x640_1xNx4" src="https://github.com/user-attachments/assets/f5ddf1e5-25b2-4589-9cdb-727a59120620" />|
720+
721+
|Femto|Atto|
722+
|:-:|:-:|
723+
|<img width="1280" height="800" alt="benchmark_times_gazelle_hgnetv2_femto_distill_1x3x416x416_1xNx4" src="https://github.com/user-attachments/assets/233239dc-c35f-4285-bfed-f02a51fe511c" />|<img width="1280" height="800" alt="benchmark_times_gazelle_hgnetv2_atto_distill_1x3x320x320_1xNx4" src="https://github.com/user-attachments/assets/137a961b-6027-4ddc-88c8-25f8b74c55fa" />|
620724

621725
- VideoAttentionTarget
622726

623727
|Variant|Param<br>(Backbone+Head)|AUC ⬆️|Avg L2 ⬇️|AP IN/OUT ⬆️|Weight|ONNX|
624728
|:-:|:-:|-:|-:|-:|:-:|:-:|
625729
|[Gaze-LLE (ViT-B)](https://arxiv.org/pdf/2412.09586)|88.80 M|0.9330|0.1070|0.8970|[Download](https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitb14_inout.pt)|---|
626730
|[Gaze-LLE (ViT-L)](https://arxiv.org/pdf/2412.09586)|302.90 M|0.9370|0.1030|0.9030|[Download](https://github.com/fkryan/gazelle/releases/download/v1.0.0/gazelle_dinov2_vitl14_inout.pt)|---|
627-
|Atto-distillation|2.93 M||||Download|Download|
628-
|Femto-distillation|3.15 M||||Download|Download|
629-
|Pico-distillation|3.51 M||||Download|Download|
630-
|N-distillation|4.61 M||||Download|Download|
731+
|Atto-distillation|2.93 M|0.9055|0.1523|0.8749|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_inout_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_atto_inout_distill_1x3x320x320_1xNx4.onnx)|
732+
|Femto-distillation|3.15 M|0.9166|0.1372|0.8779|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_inout_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_femto_inout_distill_1x3x416x416_1xNx4.onnx)|
733+
|Pico-distillation|3.51 M|0.9247|0.1245|0.8861|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_inout_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_pico_inout_distill_1x3x640x640_1xNx4.onnx)|
734+
|N-distillation|4.61 M|0.9218|0.1258|0.9012|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_inout_distill.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_hgnetv2_n_inout_distill_1x3x640x640_1xNx4.onnx)|
631735
|S-distillation|8.17 M|0.9286|0.1155|0.8945|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_inout.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tiny_inout_1x3x640x640_1xNx4.onnx)|
632736
|M-distillation|12.37 M|0.9325|0.1133|0.8953|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_inout.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vit_tinyplus_inout_1x3x640x640_1xNx4.onnx)|
633737
|L-distillation|24.33 M|0.9347|0.1026|0.9011|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_inout.pt)|[Download](https://github.com/PINTO0309/gazelle-dinov3/releases/download/weights/gazelle_dinov3_vits16_inout_1x3x640x640_1xNx4.onnx)|

0 commit comments

Comments
 (0)