Skip to content

zhangganlin/vista-slam

Repository files navigation

logo ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Ganlin Zhang1,2 · Shenhan Qian1,2 · Xi Wang1,2,3 · Daniel Cremers1,2

1 TU Munich, 2 MCML, 3 ETH Zurich

Update

  • Evaluation results on additional datasets (TUM-RGBD freiburg2 and freiburg3 partitions, Replica, and ScanNet) have been released in the supplementary material of the latest version of the paper on ArXiv. The evaluation scripts for these datasets have also been added to this repository.
  • Live camera mode is now supported!
live_demo.mp4

teaser_img ViSTA-SLAM is a real-time monocular dense SLAM pipeline that combines a Symmetric Two-view Association (STA) frontend with Sim(3) pose graph optimization and loop closure, enabling accurate camera trajectories and high-quality 3D scene reconstruction from RGB inputs.

Table of Contents
  1. Installation
  2. Run
  3. Visualization
  4. Evaluation
  5. Acknowledgement
  6. Citation and Contact

Installation

  1. Clone the repo
git clone https://github.com/zhangganlin/vista-slam.git
cd vista-slam
git submodule update --init --recursive
  1. Creating a new conda environment and install python dependencies.
conda create -n vista python=3.11 cmake=3.31.2 gcc_linux-64=11.4.0 gxx_linux-64=11.4.0 libopencv=4.12.0 -c conda-forge
conda activate vista

# install torch according to your cuda version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers --index-url https://download.pytorch.org/whl/cu121

# install python binding of DBoW3, for loop detection 
cd DBoW3Py
pip install --no-build-isolation .
cd ..

# install other python dependencies
pip install -r requirements.txt

# optional: accelerate with CUDA-based RoPE
cd vista_slam/sta_model/pos_embed/curope
python setup.py build_ext --inplace
cd ../../../../
  1. Download pretrained model. Download the pretained models from HuggingFace (https://huggingface.co/zhangganlin/vista_slam/tree/main), and put them inside the pretrains folder.
[Directory structure of pretrianed (click to expand)]
  .
  └── pretrains
        ├── frontend_sta_weights.pth
        ├── ORBvoc.txt
        └── README.md

Data preparation

# TUM-RGBD
bash scripts/download_tumrgbd.sh

# 7-Scenes
bash scripts/download_7scenes.sh

The datasets will be downloaded to ./datasets by default, please change the downloading scripts if you prefer other paths.

Run

Quick start

# Run 7-Scenes redkitchen
python run.py --config configs/7scenes.yaml --images "datasets/7scenes/redkitchen/seq-01/*.color.png" --output output/redkitchen

# Run TUM-RGBD floor
python run.py --config configs/tumrgbd.yaml --images "datasets/rgbd_dataset_freiburg1_floor/rgb/*.png" --output output/floor

These commands will run ViSTA-SLAM on 7-Scenes redkitchen scene and TUM-RGBD floor scene, and output to output/redkitchen and output/floor.

Test on other sequential data

python run.py --config configs/default.yaml --images "PATH/TO/IMAGES/*.png/jpg" --output OUTPUT_FOLDER

All adjustable configuration parameters are defined in configs/default.yaml, where explanations are also provided. You can modify them to suit your setup.

Visualization

Online visualization

ViSTA-SLAM provide online visualization via Rerun. You can either add --vis to the command line directly or adjust rerun_vis: True in the config file.

To visualize the result online, open Rerun client in one terminal,

rerun

and run ViSTA-SLAM in another terminal.

python run.py --config configs/default.yaml --images "PATH/TO/IMAGES*.png/jpg" --output OUTPUT_FOLDER --vis

rerun_eg

You can also run ViSTA-SLAM in a remote machine (i.e. cluster) and visualize it in the local machine, just adjust the rerun_url in the config file, replace rerun+http://127.0.0.1:9876/proxy with your local machine's ip with similar format. And open Rerun cilent in the local machine, run ViSTA-SLAM in the remote machine, the visualization will be shown automatically in the local Rerun client.

Run with Live Camera

Use webcam as online input to ViSTA-SLAM, use your path to webcam instead of PATH_TO_CAM, e.g /dev/video1

python run_live.py --config configs/live.yaml --camera PATH_TO_CAM --output OUTPUT_FOLDER

Other parameters are similar to the above dataset mode.

Visualize final results

We also provide a script to visualize the final results -- trajectory, reconstruction, and pose graph -- using Open3D. After running ViSTA-SLAM, run

python scripts/vis_slam_results.py OUTPUT_FOLDER

rerun_eg

Here, the light blue frustums represent the camera poses, blue lines indicate edges between neighboring views, and orange lines correspond to loop closure edges.

Evaluation

ViSTA-SLAM is mainly evaulated in TUM-RGBD and 7-Scenes, here we also provide the evaluation scripts.

# For 7-Scenes
python evaluation_7scenes.py --dataset_folder "datasets/7scenes" --output output/7scenes

# For TUM-RGBD
python evaluation_tumrgbd.py --dataset_folder "datasets/tumrgbd" --output output/tumrgbd

Note: There may be minor differences between the released codebase and the results reported in the paper due to code refactoring and hardware variations, but the overall results should be largely consistent.

Acknowledgement

Our codebase is partially based on Spann3r, SLAM3R and VGGT-SLAM, we thank the authors for making these codebases publicly available. Our work would not have been possible without your great efforts!

Citation

If you find our code or paper useful, please cite

@misc{zhang2025vistaslam,
      title={{ViSTA-SLAM}: Visual {SLAM} with Symmetric Two-view Association}, 
      author={Ganlin Zhang and Shenhan Qian and Xi Wang and Daniel Cremers},
      year={2025},
      eprint={2509.01584},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.01584}, 
}

Contact

Please raise issues in this repository or contact Ganlin Zhang directly for questions, comments, or bug reports.

About

[3DV 2026] ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages