ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Ganlin Zhang^1,2 · Shenhan Qian^1,2 · Xi Wang^1,2,3 · Daniel Cremers^1,2

¹TU Munich, ²MCML, ³ETH Zurich

Paper | Project Website | Interactive 3D Visualization

Update

Evaluation results on additional datasets (TUM-RGBD freiburg2 and freiburg3 partitions, Replica, and ScanNet) have been released in the supplementary material of the latest version of the paper on ArXiv. The evaluation scripts for these datasets have also been added to this repository.
Live camera mode is now supported!

live_demo.mp4

ViSTA-SLAM is a real-time monocular dense SLAM pipeline that combines a Symmetric Two-view Association (STA) frontend with Sim(3) pose graph optimization and loop closure, enabling accurate camera trajectories and high-quality 3D scene reconstruction from RGB inputs.

Table of Contents

Installation
Run
Visualization
Evaluation
Acknowledgement
Citation and Contact

Installation

Clone the repo

git clone https://github.com/zhangganlin/vista-slam.git
cd vista-slam
git submodule update --init --recursive

Creating a new conda environment and install python dependencies.

conda create -n vista python=3.11 cmake=3.31.2 gcc_linux-64=11.4.0 gxx_linux-64=11.4.0 libopencv=4.12.0 -c conda-forge
conda activate vista

# install torch according to your cuda version
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 xformers --index-url https://download.pytorch.org/whl/cu121

# install python binding of DBoW3, for loop detection 
cd DBoW3Py
pip install --no-build-isolation .
cd ..

# install other python dependencies
pip install -r requirements.txt

# optional: accelerate with CUDA-based RoPE
cd vista_slam/sta_model/pos_embed/curope
python setup.py build_ext --inplace
cd ../../../../

Download pretrained model. Download the pretained models from HuggingFace (https://huggingface.co/zhangganlin/vista_slam/tree/main), and put them inside the pretrains folder.

[Directory structure of pretrianed (click to expand)]

  .
  └── pretrains
        ├── frontend_sta_weights.pth
        ├── ORBvoc.txt
        └── README.md

Data preparation

# TUM-RGBD
bash scripts/download_tumrgbd.sh

# 7-Scenes
bash scripts/download_7scenes.sh

The datasets will be downloaded to ./datasets by default, please change the downloading scripts if you prefer other paths.

Run

Quick start

# Run 7-Scenes redkitchen
python run.py --config configs/7scenes.yaml --images "datasets/7scenes/redkitchen/seq-01/*.color.png" --output output/redkitchen

# Run TUM-RGBD floor
python run.py --config configs/tumrgbd.yaml --images "datasets/rgbd_dataset_freiburg1_floor/rgb/*.png" --output output/floor

These commands will run ViSTA-SLAM on 7-Scenes redkitchen scene and TUM-RGBD floor scene, and output to output/redkitchen and output/floor.

Test on other sequential data

python run.py --config configs/default.yaml --images "PATH/TO/IMAGES/*.png/jpg" --output OUTPUT_FOLDER

All adjustable configuration parameters are defined in configs/default.yaml, where explanations are also provided. You can modify them to suit your setup.

Visualization

Online visualization

ViSTA-SLAM provide online visualization via Rerun. You can either add --vis to the command line directly or adjust rerun_vis: True in the config file.

To visualize the result online, open Rerun client in one terminal,

rerun

and run ViSTA-SLAM in another terminal.

python run.py --config configs/default.yaml --images "PATH/TO/IMAGES*.png/jpg" --output OUTPUT_FOLDER --vis

You can also run ViSTA-SLAM in a remote machine (i.e. cluster) and visualize it in the local machine, just adjust the rerun_url in the config file, replace rerun+http://127.0.0.1:9876/proxy with your local machine's ip with similar format. And open Rerun cilent in the local machine, run ViSTA-SLAM in the remote machine, the visualization will be shown automatically in the local Rerun client.

Run with Live Camera

Use webcam as online input to ViSTA-SLAM, use your path to webcam instead of PATH_TO_CAM, e.g /dev/video1

python run_live.py --config configs/live.yaml --camera PATH_TO_CAM --output OUTPUT_FOLDER

Other parameters are similar to the above dataset mode.

Visualize final results

We also provide a script to visualize the final results -- trajectory, reconstruction, and pose graph -- using Open3D. After running ViSTA-SLAM, run

python scripts/vis_slam_results.py OUTPUT_FOLDER

Here, the light blue frustums represent the camera poses, blue lines indicate edges between neighboring views, and orange lines correspond to loop closure edges.

Evaluation

ViSTA-SLAM is mainly evaulated in TUM-RGBD and 7-Scenes, here we also provide the evaluation scripts.

# For 7-Scenes
python evaluation_7scenes.py --dataset_folder "datasets/7scenes" --output output/7scenes

# For TUM-RGBD
python evaluation_tumrgbd.py --dataset_folder "datasets/tumrgbd" --output output/tumrgbd

Note: There may be minor differences between the released codebase and the results reported in the paper due to code refactoring and hardware variations, but the overall results should be largely consistent.

Acknowledgement

Our codebase is partially based on Spann3r, SLAM3R and VGGT-SLAM, we thank the authors for making these codebases publicly available. Our work would not have been possible without your great efforts!

Citation

If you find our code or paper useful, please cite

@misc{zhang2025vistaslam,
      title={{ViSTA-SLAM}: Visual {SLAM} with Symmetric Two-view Association}, 
      author={Ganlin Zhang and Shenhan Qian and Xi Wang and Daniel Cremers},
      year={2025},
      eprint={2509.01584},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.01584}, 
}

Contact

Please raise issues in this repository or contact Ganlin Zhang directly for questions, comments, or bug reports.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
DBoW3Py @ 7f485cb		DBoW3Py @ 7f485cb
configs		configs
media		media
pretrains		pretrains
remote_live_setting		remote_live_setting
scripts		scripts
vista_slam		vista_slam
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
evaluation_7scenes.py		evaluation_7scenes.py
evaluation_replica.py		evaluation_replica.py
evaluation_scannet.py		evaluation_scannet.py
evaluation_tumrgbd.py		evaluation_tumrgbd.py
evaluation_tumrgbdf2f3.py		evaluation_tumrgbdf2f3.py
requirements.txt		requirements.txt
run.py		run.py
run_live.py		run_live.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Paper | Project Website | Interactive 3D Visualization

Update

Installation

Data preparation

Run

Quick start

Test on other sequential data

Visualization

Online visualization

Run with Live Camera

Visualize final results

Evaluation

Acknowledgement

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

License

zhangganlin/vista-slam

Folders and files

Latest commit

History

Repository files navigation

ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

Paper | Project Website | Interactive 3D Visualization

Update

Installation

Data preparation

Run

Quick start

Test on other sequential data

Visualization

Online visualization

Run with Live Camera

Visualize final results

Evaluation

Acknowledgement

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages