Skip to content

pulkitkumar95/tats

Repository files navigation

Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

Pulkit Kumar1 · Namitha Padmanabhan1 · Luke Luo1 · Sai Saketh Rambhatla1,2 · Abhinav Shrivastava1

1University of Maryland, College Park    2GenAI, Meta

ECCV 2024

Paper PDF Project Page

This repository contains the code for our paper "Trajectory-aligned Space-time Tokens for Few-shot Action Recognition".

Installation

  1. Create a conda environment using the provided environment file:
conda env create -f environment.yml
  1. Activate the environment:
conda activate tats

Dataset Preparation

Please refer to:

  • slowfast/datasets/DATASET.md for dataset preparation instructions
  • point_tracking/README.md for point extraction details

The few-shot split information for all datasets can be downloaded from here.

Note: During the code release preparation, we thoroughly tested the codebase with SSv2 and Kinetics datasets. While other datasets should work as expected, if you encounter any issues while working with them, please raise an issue in the repository.

Training and Testing

Before running the training, set the following environment variables:

# Path to store PyTorch models and weights
export TORCH_HOME=/path/to/torch/home

# Path to dataset directory containing:
# - Videos
# - Dataset splits
# - Point tracking data
export DATA_DIR=/path/to/data

To train the model on SSv2 small, you can use the following command:

torchrun --nproc_per_node=$NUM_GPUS --master_port=$PORT tools/run_net.py \
    --init_method env:// \
    --new_dist_init \
    --cfg configs/TaTs/ssv2_longer_steps.yaml \
    MASTER_PORT $PORT \
    OUTPUT_DIR $OUTPUT_DIR \
    NUM_GPUS $NUM_GPUS \
    DATA_LOADER.NUM_WORKERS $NUM_WORKERS \
    DATA.PATH_TO_DATA_DIR $DATA_DIR \
    DATA.USE_RAND_AUGMENT True \
    POINT_INFO.NAME cotracker2_16_uniform_8_corrected \
    MODEL.FEAT_EXTRACTOR dino \
    MODEL.DINO_CONFIG dinov2_vitb14 \
    FEW_SHOT.TRAIN_EPISODES $TRAIN_EPISODES \
    FEW_SHOT.K_SHOT $K_SHOT \
    FEW_SHOT.TRAIN_QUERY_PER_CLASS $TRAIN_QUERY_PER_CLASS \
    FEW_SHOT.N_WAY $N_WAY \
    WANDB_STUFF.WANDB_ID $WANDB_ID \
    WANDB_STUFF.EXP_NAME $EXP_NAME \
    SSV2.SPLIT ssv2_small_molo

Key parameters:

  • NUM_GPUS: Number of GPUs to use (e.g., 4)
  • NUM_WORKERS: Number of data loader workers (e.g., 16)
  • K_SHOT: Number of support examples per class (e.g., 1)
  • N_WAY: Number of classes per episode (e.g., 5)
  • TRAIN_EPISODES: Number of training episodes (e.g., 400)
  • TRAIN_QUERY_PER_CLASS: Number of query examples per class (e.g., 6)

Development

This codebase is under active development. If you encounter any issues or have questions, please feel free to:

  • Open an issue in this repository
  • Contact Pulkit at pulkit[at]umd[dot]edu

Acknowledgments

This codebase is built upon two excellent repositories:

  • ORViT: Object-Regions for Video Instance Recognition and Tracking
  • MoLo: Motion-augmented Long-form Video Understanding

We thank the authors for making their code publicly available.

Citation

If you find this code and out paper useful for your research, please cite our paper:

@inproceedings{kumar2024trajectory,
  title={Trajectory-aligned Space-time Tokens for Few-shot Action Recognition},
  author={Kumar, Pulkit and Padmanabhan, Namitha and Luo, Luke and Rambhatla, Sai Saketh and Shrivastava, Abhinav},
  booktitle={European Conference on Computer Vision},
  pages={474--493},
  year={2024},
  organization={Springer}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages