Pulkit Kumar1 · Namitha Padmanabhan1 · Luke Luo1 · Sai Saketh Rambhatla1,2 · Abhinav Shrivastava1
1University of Maryland, College Park 2GenAI, Meta
ECCV 2024
This repository contains the code for our paper "Trajectory-aligned Space-time Tokens for Few-shot Action Recognition".
- Create a conda environment using the provided environment file:
conda env create -f environment.yml
- Activate the environment:
conda activate tats
Please refer to:
slowfast/datasets/DATASET.md
for dataset preparation instructionspoint_tracking/README.md
for point extraction details
The few-shot split information for all datasets can be downloaded from here.
Note: During the code release preparation, we thoroughly tested the codebase with SSv2 and Kinetics datasets. While other datasets should work as expected, if you encounter any issues while working with them, please raise an issue in the repository.
Before running the training, set the following environment variables:
# Path to store PyTorch models and weights
export TORCH_HOME=/path/to/torch/home
# Path to dataset directory containing:
# - Videos
# - Dataset splits
# - Point tracking data
export DATA_DIR=/path/to/data
To train the model on SSv2 small, you can use the following command:
torchrun --nproc_per_node=$NUM_GPUS --master_port=$PORT tools/run_net.py \
--init_method env:// \
--new_dist_init \
--cfg configs/TaTs/ssv2_longer_steps.yaml \
MASTER_PORT $PORT \
OUTPUT_DIR $OUTPUT_DIR \
NUM_GPUS $NUM_GPUS \
DATA_LOADER.NUM_WORKERS $NUM_WORKERS \
DATA.PATH_TO_DATA_DIR $DATA_DIR \
DATA.USE_RAND_AUGMENT True \
POINT_INFO.NAME cotracker2_16_uniform_8_corrected \
MODEL.FEAT_EXTRACTOR dino \
MODEL.DINO_CONFIG dinov2_vitb14 \
FEW_SHOT.TRAIN_EPISODES $TRAIN_EPISODES \
FEW_SHOT.K_SHOT $K_SHOT \
FEW_SHOT.TRAIN_QUERY_PER_CLASS $TRAIN_QUERY_PER_CLASS \
FEW_SHOT.N_WAY $N_WAY \
WANDB_STUFF.WANDB_ID $WANDB_ID \
WANDB_STUFF.EXP_NAME $EXP_NAME \
SSV2.SPLIT ssv2_small_molo
Key parameters:
NUM_GPUS
: Number of GPUs to use (e.g., 4)NUM_WORKERS
: Number of data loader workers (e.g., 16)K_SHOT
: Number of support examples per class (e.g., 1)N_WAY
: Number of classes per episode (e.g., 5)TRAIN_EPISODES
: Number of training episodes (e.g., 400)TRAIN_QUERY_PER_CLASS
: Number of query examples per class (e.g., 6)
This codebase is under active development. If you encounter any issues or have questions, please feel free to:
- Open an issue in this repository
- Contact Pulkit at pulkit[at]umd[dot]edu
This codebase is built upon two excellent repositories:
- ORViT: Object-Regions for Video Instance Recognition and Tracking
- MoLo: Motion-augmented Long-form Video Understanding
We thank the authors for making their code publicly available.
If you find this code and out paper useful for your research, please cite our paper:
@inproceedings{kumar2024trajectory,
title={Trajectory-aligned Space-time Tokens for Few-shot Action Recognition},
author={Kumar, Pulkit and Padmanabhan, Namitha and Luo, Luke and Rambhatla, Sai Saketh and Shrivastava, Abhinav},
booktitle={European Conference on Computer Vision},
pages={474--493},
year={2024},
organization={Springer}
}