This repository contains the source code for our papers:
GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats
Abstrat: Tracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first RGB NeRF / 3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI, KITTI 360, 4 Seasons and A2D2 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.
[13 Jun 2025]
1. Fixed several bugs especially ones related to UniDepth
(see issue #6, #7 and this link); 2. Fixed some typos and grammar mistakes in the code and README.
[10 Jun 2025]
Arxiv paper v2: Added more experiments, expanded visuals, included additional details, and fixed typos & grammar.
[26 May 2025]
1. Fixed several issues including config files and submodule problems; 2. Refactored the code to reduce unnecessary memory allocations and improve thread-level parallelism; 3. Restructured parts of the underlying logic of the code for readability; 4. Updated the paper, added some experiments and visualizations, and improved the writing of the article.
[11 Mar 2025]
Arxiv paper submitted.
This project was developed, tested, and run in the following hardware/system environment
Hardware Environment:
CPU(s): Intel Xeon(R) Gold 6128 CPU @ 3.40GHz Ă— 12 / Intel Xeon(R) Platinum 8467C CPU @ 2.60GHz x 20
GPU(s): NVIDIA RTX 4090 (24 GiB VRAM) / NVIDIA L20 (48 GiB GDDR6)
RAM: 67.0 GiB (DDR4, 2666 MT/s) / 128.0 GiB (DDR4, 3200 MT/s)
Disk: Dell 8TB 7200RPM HDD (SATA, Seq. Read 220 MiB/s)
System Environment:
Linux System: Ubuntu 22.04.3 LTS
CUDA Version: 11.8
cuDNN Version: 9.1.0
NVIDIA Drivers: 555.42.06
Conda version: 23.9.0 (Miniconda)
Compilers & Build Tools:
NVIDIA CUDA Compiler (nvcc): V11.8.89
C++ Compiler: GCC/G++ 11.4.0
GNU Make Version: 4.3
Cmake Version: 3.22.1
As part of the project code relies on CUDA/C++
, please ensure your compilation environment is properly workable.
Creating a virtual environment using conda (or miniconda)
conda create -n gigaslam python=3.10
conda activate gigaslam
# pip version created by conda: 25.1
Next, install PyTorch
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
# Verified to work with CUDA 11.8 and torch 2.2.0
Then install other dependencies
pip install -r requirements.txt
Note: If you encounter installation issues with torch_scatter
(which may happen in certain network environments), comment out the corresponding line in requirements.txt
and manually download & install the wheel from link: https://pytorch-geometric.com/whl/
# Download the torch_scatter wheel package for torch-2.2.0 + CUDA 11.8
wget https://data.pyg.org/whl/torch-2.2.0%2Bcu118/torch_scatter-2.1.2%2Bpt22cu118-cp310-cp310-linux_x86_64.whl
# Alternative: Manually download via browser from the same URL
# Install the torch_scatter from the local wheel file
pip install ./torch_scatter-2.1.2+pt22cu118-cp310-cp310-linux_x86_64.whl
Special Note Regarding Version Compatibility: Particular attention must be paid to the
xformers-0.0.24
dependency specified inrequirements.txt
. If you attempt to install alternative versions oftorch
(other than the explicitly recommended2.2.0
), subsequent installations ofxformers
viapip
will force-uninstall your currentPyTorch
version and replace it with the specific version deemed compatible byxformers
. This forced version downgrade/upgrade poses critical risks toCUDA/C++
compilation workflows, as PyPI-distributedxformers
packages are pre-compiled binaries tightly coupled to specific CUDA toolkit andPyTorch
versions.Extended Compatibility Guidance: For developers requiring alternative
xformers
orPyTorch
versions, consult the xFormers GitHub Repository for version relationships. While the repository lacks an official compatibility reference, we have curated version compatibility references by cross-referencing release notes. The following table may assist both GigaSLAM users and general developers who are encountering similar dependency conflicts:
xFormers
PyTorch
xFormers
PyTorch
0.0.21 2.0.1 0.0.26 2.2.0 0.0.22 2.0.1 0.0.27 2.3.0 0.0.23 2.1.1 0.0.28 2.4.1 0.0.24 2.2.0 0.0.29 2.5.1 0.0.25 2.2.0 0.0.30 2.7.0
Proceed to install the required CUDA/C++
components. Thoroughly verify your build environment – incompatible compiler versions, conflicting dependencies, or mismatched CUDA toolkits may cause compilation failures or unexpected binary behavior.
pip install submodules/simple-knn
pip install submodules/diff-gaussian-rasterization
Install the OpenCV C++ API
.
sudo apt-get install -y libopencv-dev
Install DBoW2
cd DBoW2
mkdir -p build && cd build
cmake ..
make
sudo make install
cd ../..
Install the image retrieval
pip install ./DPRetrieval
python setup.py install
Download the pre-trained Bag of Words vocabulary for DBoW2
:
# Download the vocabulary file (about 150 MiB)
wget https://github.com/UZ-SLAMLab/ORB_SLAM3/raw/master/Vocabulary/ORBvoc.txt.tar.gz
# You could download manually from the link via your browser
# Extract the vocabulary file
tar -xzvf ORBvoc.txt.tar.gz
# Verify extraction
ls -l ORBvoc.txt
Before running the project, you need to modify the .yaml
files under the ./config
directory. Specifically, replace the dataset path with the path to your downloaded dataset in PNG
or JPG
format and the camera intrinsics fx
fy
cx
cy
. For example,
...
Dataset:
color_path: "/media/deng/Data/4SeasonsDataset/BusinessCampus_recording_2020-10-08_09-30-57/undistorted_images/cam0"
# replace it to your local path
Calibration:
fx: 501.4757919305817
...
Then, run the following command to start the SLAM process. Pretrained weights for DISK
, LightGlue
, and UniDepth
will be automatically downloaded on the first execution:
python slam.py --config <path_to_your_config>.yaml
for example,
python slam.py --config ./configs/kitti_06.yaml
Note on loading UniDepth
from HuggingFace: UniDepth
models are loaded from HuggingFaceHub by default. However, due to network restrictions in certain countries/regions, even with a VPN, the model may fail to load properly. If this happens, try run the following command before running:
export HF_ENDPOINT=https://hf-mirror.com
If the problem is still unsolved, set the ['DepthModel']['from_huggingface']
to False
. Then manually download the UniDepth
model weights (e.g., via a web browser), and set the local path to the downloaded weights in the ['DepthModel']['local_snapshot_path']
field. You can find download links for various UniDepth
model weights in links: UniDepth V2 HuggingFace VtT-L [12 Jun 2024] & UniDepth V2 HuggingFace VtT-S [12 Jun 2024]. And pay attention to the version of pre-trained weights of UniDepthV2
. This repository specifically uses the pre-trained weights version dated [12 Jun 2024]
. Please be advised that newer releases of UniDepthV2
are currently incompatible with GigaSLAM
due to unknown reasons.
The snapshot directory may contain the files as follows:
snapshot_dir
├── config.json
├── model.safetensors
└── pytorch_model.bin
If you set ['SLAM']['viz']
to True
in the .yaml
file, you will be able to see output like the following in the results/your_exps/
directory during execution:
Important Notes: This project is implemented based on the MonoGS framework, whose native architecture primarily targets small-scale scenes. In terms of hardware requirements, processing ultra-long sequence data significantly increases CPU RAM load - through targeted optimizations, we have stabilized the CPU memory consumption of 4000-frame KITTI datasets at approximately 10 GiB. However, longer sequences may still require additional memory resources. We strongly recommend running this system on server environments equipped with 32+ GiB CPU RAM. For personal computers (particularly common 16 GiB CPU RAM setups), please continuously monitor memory usage via System Monitor (or similar) to prevent sudden memory spikes from affecting other system processes.
We are currently exploring methods to further optimize memory consumption to achieve better operational efficiency.
Our project is based on Scaffold-GS, UniDepth, MonoGS, DF-VO, DPVO (DPV-SLAM). Our work would not have been possible without these excellent repositories.
If you find our work helpful, please consider citing:
@article{deng2025gigaslam,
title={GigaSLAM: Large-Scale Monocular SLAM with Hierarchical Gaussian Splats},
author={Deng, Kai and Yang, Jian and Wang, Shenlong and Xie, Jin},
journal={arXiv preprint arXiv:2503.08071},
year={2025}
}