A unified agent system for general-purpose robots.
FSR-VLN is a vision-language navigation system designed for humanoid robots that integrates a Hierarchical Multi-modal Scene Graph (HMSG) for coarse-to-fine environment representation with Fast-to-Slow Navigation Reasoning (FSR), leveraging VLM-driven refinement to enable efficient, real-time long-range spatial reasoning.
- Release the code of FSR-VLN.
If you find our project useful, please consider citing it:
@misc{zhou2025fsrvlnfastslowreasoning,
title={FSR-VLN: Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph},
author={Xiaolin Zhou and Tingyang Xiao and Liu Liu and Yucheng Wang and Maiyue Chen and Xinrui Meng and Xinjie Wang and Wei Feng and Wei Sui and Zhizhong Su},
year={2025},
eprint={2509.13733},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.13733},
}This project is licensed under the Apache License 2.0. See the LICENSE file for details.
