Ganlin Zhang1,2 · Shenhan Qian1,2 · Xi Wang1,2,3 · Daniel Cremers1,2
1 TU Munich, 2 MCML, 3 ETH Zurich
- Evaluation results on additional datasets (TUM-RGBD freiburg2 and freiburg3 partitions, Replica, and ScanNet) have been released in the supplementary material of the latest version of the paper on ArXiv. The evaluation scripts for these datasets have also been added to this repository.
- Live camera mode is now supported!
live_demo.mp4
ViSTA-SLAM is a real-time monocular dense SLAM pipeline that combines a Symmetric Two-view Association (STA) frontend with Sim(3) pose graph optimization and loop closure, enabling accurate camera trajectories and high-quality 3D scene reconstruction from RGB inputs.
Table of Contents
- Clone the repo
git clone https://github.com/zhangganlin/vista-slam.git
cd vista-slam
git submodule update --init --recursive- Creating a new conda environment and install python dependencies.
conda create -n vista python=3.11 cmake=3.31.2 gcc_linux-64=11.4.0 gxx_linux-64=11.4.0 libopencv=4.12.0 -c conda-forge
conda activate vista
# install torch according to your cuda version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers --index-url https://download.pytorch.org/whl/cu121
# install python binding of DBoW3, for loop detection
cd DBoW3Py
pip install --no-build-isolation .
cd ..
# install other python dependencies
pip install -r requirements.txt
# optional: accelerate with CUDA-based RoPE
cd vista_slam/sta_model/pos_embed/curope
python setup.py build_ext --inplace
cd ../../../../- Download pretrained model.
Download the pretained models from HuggingFace (https://huggingface.co/zhangganlin/vista_slam/tree/main), and put them inside the
pretrainsfolder.
[Directory structure of pretrianed (click to expand)]
.
└── pretrains
├── frontend_sta_weights.pth
├── ORBvoc.txt
└── README.md
# TUM-RGBD
bash scripts/download_tumrgbd.sh
# 7-Scenes
bash scripts/download_7scenes.shThe datasets will be downloaded to ./datasets by default, please change the downloading scripts if you prefer other paths.
# Run 7-Scenes redkitchen
python run.py --config configs/7scenes.yaml --images "datasets/7scenes/redkitchen/seq-01/*.color.png" --output output/redkitchen
# Run TUM-RGBD floor
python run.py --config configs/tumrgbd.yaml --images "datasets/rgbd_dataset_freiburg1_floor/rgb/*.png" --output output/floorThese commands will run ViSTA-SLAM on 7-Scenes redkitchen scene and TUM-RGBD floor scene, and output to output/redkitchen and output/floor.
python run.py --config configs/default.yaml --images "PATH/TO/IMAGES/*.png/jpg" --output OUTPUT_FOLDERAll adjustable configuration parameters are defined in configs/default.yaml, where explanations are also provided. You can modify them to suit your setup.
ViSTA-SLAM provide online visualization via Rerun. You can either add --vis to the command line directly or adjust rerun_vis: True in the config file.
To visualize the result online, open Rerun client in one terminal,
rerunand run ViSTA-SLAM in another terminal.
python run.py --config configs/default.yaml --images "PATH/TO/IMAGES*.png/jpg" --output OUTPUT_FOLDER --vis
You can also run ViSTA-SLAM in a remote machine (i.e. cluster) and visualize it in the local machine, just adjust the rerun_url in the config file, replace rerun+http://127.0.0.1:9876/proxy with your local machine's ip with similar format. And open Rerun cilent in the local machine, run ViSTA-SLAM in the remote machine, the visualization will be shown automatically in the local Rerun client.
Use webcam as online input to ViSTA-SLAM, use your path to webcam instead of PATH_TO_CAM, e.g /dev/video1
python run_live.py --config configs/live.yaml --camera PATH_TO_CAM --output OUTPUT_FOLDEROther parameters are similar to the above dataset mode.
We also provide a script to visualize the final results -- trajectory, reconstruction, and pose graph -- using Open3D. After running ViSTA-SLAM, run
python scripts/vis_slam_results.py OUTPUT_FOLDERViSTA-SLAM is mainly evaulated in TUM-RGBD and 7-Scenes, here we also provide the evaluation scripts.
# For 7-Scenes
python evaluation_7scenes.py --dataset_folder "datasets/7scenes" --output output/7scenes
# For TUM-RGBD
python evaluation_tumrgbd.py --dataset_folder "datasets/tumrgbd" --output output/tumrgbdNote: There may be minor differences between the released codebase and the results reported in the paper due to code refactoring and hardware variations, but the overall results should be largely consistent.
Our codebase is partially based on Spann3r, SLAM3R and VGGT-SLAM, we thank the authors for making these codebases publicly available. Our work would not have been possible without your great efforts!
If you find our code or paper useful, please cite
@misc{zhang2025vistaslam,
title={{ViSTA-SLAM}: Visual {SLAM} with Symmetric Two-view Association},
author={Ganlin Zhang and Shenhan Qian and Xi Wang and Daniel Cremers},
year={2025},
eprint={2509.01584},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.01584},
}Please raise issues in this repository or contact Ganlin Zhang directly for questions, comments, or bug reports.


