Skip to content

ut-vision/SFHand

Repository files navigation

SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation

Paper Data Model


🔥 Project Highlights

Demo GIF

Feature Description
📡 Streaming Framework Autoregressive multi-modal hand forecasting
Full-State Predictions Hand type, 2D box, 3D pose, and trajectory
🧠 ROI-Enhanced Memory Temporal hand awareness
🗣️ Language-guided Follows natural language instructions

🎬 Method Overview

Method Figure


📝 Introduction

💡 SFHand is the first streaming architecture for language-guided 3D hand forecasting.

SFHand predicts future hand dynamics from continuous egocentric video + text instructions. The model outputs the following hand states autoregressively: Hand type, 2D bounding box, 3D hand pose, and 3D trajectory

Key components: Streaming autoregressive transformer and ROI-enhanced memory.


📦 Project Status

Component Status
EgoHaFL Dataset
Pretraining Code
Pretrained Weights
Evaluation Code
Embodied Evaluation (Franka Kitchen) 🔜 Coming soon
3D Hand Annotation Code 🔜 Coming soon

🔧 Installation

We develop and test the project under: torch 2.8.0+cu129.

git clone [email protected]:ut-vision/SFHand.git

conda env create -f environment.yml
conda activate sfhand

pip install -r requirements.txt
conda install -c conda-forge libgl

Download MANO model and put MANO_LEFT.pkl and MANO_RIGHT.pkl under data/mano.

Download base_best.pt from EgoHOD checkpoint and place it at ./pre_ckpt/base_best.pt.


📂 Dataset: EgoHaFL

EgoHaFL Dataset (annotations): 👉 https://huggingface.co/datasets/ut-vision/EgoHaFL

Videos originate from Ego4D V1: https://ego4d-data.org/. We use 224p compressed clips.

Directory structure:

EgoHaFL
    ├── EgoHaFL_lmdb
    │   ├── data.mdb
    │   └── lock.mdb
    ├── EgoHaFL_train.csv
    ├── EgoHaFL_test.csv
    └── v1
        └── videos_224p

🚀 Training & Evaluation

Train + Eval

bash ./exps/pretrain.sh

⚠️ Before training, edit configs in ./configs.

Eval + Visualization

python main.py --config_file configs/config/clip_base_eval.yml --eval --vis

Output visualizations → ./render_results/


🧠 Pretrained Models

Download here:

👉 https://huggingface.co/ut-vision/SFHand


🤖 Embodied Evaluation (Franka Kitchen)

⏳ Coming soon — code will be added once finalized.


✍️ 3D Hand Annotation

⏳ Coming soon — detailed annotation tools, formats, and processing scripts will be released once finalized.


📚 Citation

@article{liu2025sfhand,
  title={SFHand: A Streaming Framework for Language-guided 3D Hand Forecasting and Embodied Manipulation},
  author={Liu, Ruicong and Huang, Yifei and Ouyang, Liangyang and Kang, Caixin and and Sato, Yoichi},
  journal={arXiv preprint arXiv:2511.18127},
  year={2025}
}

🙏 Acknowledgement

SFHand builds on EgoHOD. Thanks to all contributors of the original codebase.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published