GenPT is the first generative point tracker that addresses the limitations of conventional discriminative models in capturing multi-modality by directly modelling the multi-modality inherent to point tracking through Flow Matching.
By modifying flow matching with iterative refinement, a window-dependent prior, and a specialized variance schedule, GenPT effectively captures uncertainty, particularly behind occlusions. GenPT can sample trajectories from plausible modes in the solution space, which can be exploited by a best-first search on generated samples, guided by the model’s confidence, to improve tracking accuracy. GenPT achieves its state-of-the-art performance while using few parameters and operating at a fast inference speed compared to baselines.
- [Oct 21, 2025] Initial release. We are actively adding to this repo, so please ping or open an issue if you notice something missing or broken!
Install miniconda:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate
conda init
Clone the repo and set up a new conda environment for GenPT:
git clone https://github.com/tesfaldet/genpt
cd genpt
conda env create -f environment.yaml
conda activate genpt
| Dataset | Version | Link | Notes |
|---|---|---|---|
| PointOdyssey | 1.4 | download | v1.4 contains some bug fixes but it shouldn't dramatically change results. |
| CoTracker3_Kubric | N/A | website | For training only. |
| Dynamic Replica | N/A | website | For testing only. |
| TAP-Vid DAVIS | N/A | website | For testing only. |
| TAP-Vid Kinetics | N/A | website | For testing only. |
| TAP-Vid RGB-S | N/A | website | For testing only. |
| TAP-Vid RoboTAP | N/A | website | For testing only. |
Datasets should be downloaded to ./data/datasets. Please make sure the paths match with what's shown in ./configs/local/default.yaml.
Make sure to modify the PROJECT_SCRATCH env var in the .env file to the appropriate path.
| Model | Version | Trained on | Checkpoint |
|---|---|---|---|
| GenPT | 1.0.0 | PointOdyssey | download |
| GenPT | 1.0.0 | CoTracker3_Kubric | download |
Checkpoints should be downloaded to ./data/checkpoints.
Model checkpoints for ablated models and the discriminative variant will be made available at a later time.
To evaluate GenPT on PointOdyssey with a single GPU, execute the following:
python src/evaluate.py trainer=gpu trainer.devices=1 experiment=evaluate_genpt_fm_pointodyssey ckpt_path=<PATH>
where <PATH> is set to a downloaded GenPT checkpoint, e.g., ./data/checkpoints/genpt_fm_podv14.ckpt.
To evaluate GenPT on PointOdyssey with 4 GPUs, execute the following:
python src/evaluate.py trainer=ddp trainer.devices=4 experiment=evaluate_genpt_fm_pointodyssey ckpt_path=<PATH>
where <PATH> is set to a downloaded GenPT checkpoint, e.g., ./data/checkpoints/genpt_fm_podv14.ckpt.
To evaluate on PointOdyssey, set experiment=evaluate_genpt_fm_pointodyssey. To evaluate on Dynamic Replica, set experiment=evaluate_genpt_fm_dynamic_replica.
To evaluate on DAVIS, Kinetics, RGB-S, or RoboTAP, set experiment=evaluate_genpt_fm_tapvid_<NAME>, where <NAME> is one of [davis, kinetics, rgbs, robotap].
To enable the sliding occluder while evaluating on any of the TAP-Vid datasets mentioned above, set data.test_datasets.0.occluder_direction=<DIRECTION>, where <DIRECTION> is one of [lr, rl, tb, bt] (corresponding to left-to-right, right-to-left, top-to-bottom, and bottom-to-top, respectively).
To pick the best of N generated trajectories (within each window, in a causal fashion), guided by GenPT's confidence estimates, you will need to modify the test_step() function in ./src/models/genpt_fm_lightning_module.py. Specifically, modify the following variables as such:
num_samples = <N>(<N>is some integer value, like5)search_mode = "greedy"
Beam search can be used instead of a simple greedy search:
num_samples = <N>(<N>is some integer value, like5)search_mode = "beam"beam_width = <W>(respect the inequality1 <= beam_width <= num_samples)
The user friendliness of this functionality will eventually be improved.
Below are examples of training GenPT on PointOdyssey in a variety of compute setups. For the paper, we used 4 GPUs in total with everything else set to default. To train on CoTracker3_Kubric, set experiment=train_trainer_tapvid_kubric.
To train GenPT on PointOdyssey with a single GPU on a single node, execute the following:
python src/train.py experiment=train_tracker_pointodyssey trainer=gpu model=genpt_fm
To train with local DDP on a single node and 4 GPUs with Tensorboard logging, execute the following:
python src/train.py experiment=train_tracker_pointodyssey trainer=ddp trainer.devices=4 trainer.num_nodes=1 logger=tensorboard model=genpt_fm
To train on a remote SLURM cluster using 2 nodes and 2 GPUs each, with Tensorboard logging, execute the following:
python src/train.py --multirun experiment=train_tracker_pointodyssey trainer=ddp trainer.devices=2 trainer.num_nodes=2 logger=tensorboard hydra=submitit_remote_launcher model=genpt_fm
This requires the CLUSTER_PARTITION and CLUSTER_GPU_TYPE env vars in .env to be modified accordingly.
An example of local single GPU training of GenPT with no logging:
python src/train.py experiment=train_tracker_pointodyssey trainer=gpu logger=null callbacks.learning_rate_monitor=null model=genpt_fm
We welcome contributions to this project. If you would like to contribute, please make a pull request and Mattie will take a look as soon as possible.
For reporting issues, such as bugs or missing information, please create a new issue.
If you use this code for your research, please consider giving it a star ⭐️ and cite its research paper:
Mattie Tesfaldet, Adam W. Harley, Konstantinos G. Derpanis, Derek Nowrouzezahrai, and Christopher Pal. Generative Point Tracking with Flow Matching. arXiv preprint, 2025.
Bibtex format:
@article{tesfaldet2025,
title = {{Generative Point Tracking with Flow Matching}},
author = {Tesfaldet, Mattie and Harley, Adam W and Derpanis, Konstantinos G and Nowrouzezahrai, Derek and Pal, Christopher},
journal = {arXiv preprint},
year = {2025}
}

