📌 Important Note: This repository contains the RL training framework for the SRU navigation project, providing neural network architectures and on-policy training algorithms (PPO/MDPO). This repository does not include the simulation environments or task definitions. See the project website for the complete navigation system.
End-to-end RL training framework for visual navigation with SRU (Spatially-Enhanced Recurrent Units) architecture. This repository extends the original rsl_rl framework with:
- SRU Architecture: Advanced recurrent networks with spatial transformation operations for implicit spatial memory
- Attention Mechanisms: Self-attention and cross-attention modules for multimodal fusion
- Training Algorithms: PPO, SPO, and MDPO (with Deep Mutual Learning) implementations
- Multi-Camera Support: Native handling of 1-2 camera inputs with proper masking
This framework is designed to train navigation policies that achieve 23.5% improvement over standard RNNs and enable zero-shot sim-to-real transfer.
✅ Neural Network Architectures
- ActorCriticSRU: SRU-based policy network with attention mechanisms
- LSTM_SRU: Core recurrent unit with spatial transformation gates
- Cross-attention fusion module for vision + proprioception
- 3D positional encoding for volumetric features
✅ RL Training Algorithms
- PPO (Proximal Policy Optimization)
- SPO (Symmetric Policy Optimization)
- MDPO (Multi-Distillation Policy Optimization with Deep Mutual Learning)
- MUON optimizer support for stable training with reduced memory usage (newly added, not in original paper)
✅ Training Infrastructure
- GPU-accelerated training pipeline
- Rollout storage with hidden state tracking
- Temporally consistent dropout across trajectories
- Multi-sensor observation handling
✅ Model Export
- JIT export for PyTorch deployment
- ONNX export for cross-platform inference (C++, ROS, TensorRT)
- Explicit hidden state I/O for recurrent models in ONNX format
✅ Logging & Monitoring
- Tensorboard integration
- Weights & Biases support
- Neptune logging
❌ Simulation environments (Isaac Lab tasks, maze generation, terrain) ❌ Task definitions (observation spaces, reward functions, action interfaces) ❌ Robot models and locomotion policies ❌ Depth encoder pretraining infrastructure
Note: The simulation environment must be installed separately. See sru-navigation-sim for the IsaacLab extension.
The repository is organized as follows:
sru-navigation-learning/
├── config/ # Training configuration files
│ └── dummy_config.yaml # Example PPO configuration
├── rsl_rl/ # Main Python package
│ ├── algorithms/ # RL algorithms
│ │ ├── ppo.py # Proximal Policy Optimization
│ │ ├── spo.py # Symmetric Policy Optimization
│ │ └── mdpo.py # Multi-Distillation Policy Optimization
│ ├── modules/ # Neural network architectures
│ │ ├── actor_critic.py # Basic MLP actor-critic
│ │ ├── actor_critic_recurrent.py # RNN-based actor-critic
│ │ ├── actor_critic_sru.py # SRU architecture (primary)
│ │ └── normalizer.py # Observation normalization
│ ├── networks/ # Network components
│ │ └── sru_memory/ # SRU memory modules
│ │ ├── lstm_sru.py # LSTM with SRU gating
│ │ └── attention.py # Cross-attention fusion with 3D positional encoding
│ ├── runners/ # Training orchestration
│ │ └── on_policy_runner.py # Main training loop
│ ├── storage/ # Experience replay
│ │ └── rollout_storage.py # Trajectory buffer
│ ├── env/ # Environment interface
│ │ └── vec_env.py # Vectorized environment wrapper
│ └── utils/ # Utilities
│ ├── trajectory_handler.py # Padding/unpadding helpers
│ └── logging.py # Logging utilities
├── licenses/ # Dependency licenses
├── setup.py # Package installation
├── pyproject.toml # Project configuration
└── README.md # This file
ActorCriticSRU (rsl_rl/modules/actor_critic_sru.py)
- Dual-input architecture: depth images + proprioceptive information
- Processing pipeline: Self-attention → Cross-attention → SRU → MLP
- Separate actor and critic networks with shared depth encoder
- Time embedding for critic value estimation
- Supports 1-2 camera inputs with automatic padding/masking
LSTM_SRU (rsl_rl/networks/sru_memory/lstm_sru.py)
- Multi-layer LSTM with SRU-style spatial transformation gates
- Polynomial refinement for forget gate (from research paper)
- Element-wise transformation operations for spatial memory
- Orthogonal weight initialization
CrossAttentionFuseModule (rsl_rl/networks/sru_memory/attention.py)
- Fuses volumetric (image) features with proprioceptive state
- Self-attention → Feed-forward → Cross-attention architecture
- 3D positional encoding for spatial awareness
- Efficient batched attention outside RNN loop
Training Algorithms (rsl_rl/algorithms/)
- PPO: Standard clipped policy optimization with value loss
- MDPO: Deep Mutual Learning with KL-divergence distillation between two networks
- Adaptive and fixed learning rate schedules
git clone https://github.com/leggedrobotics/sru-navigation-learning.git
cd sru-navigation-learning
pip install -e .If you're using this package with Isaac Lab, you need to replace the pre-installed rsl_rl package.
You can place this repository anywhere, but the recommended structure is:
IsaacLab/
├── source/
│ ├── isaaclab/ # Core Isaac Lab
│ ├── isaaclab_assets/ # Asset library
│ ├── isaaclab_rl/ # RL framework wrappers
│ ├── isaaclab_tasks/ # Task definitions
│ └── isaaclab_nav_task/ # Your navigation task extension
├── rsl_rl/ # This SRU-enhanced RL framework (recommended location)
├── _isaac_sim/ # Isaac Sim installation
└── isaaclab.sh # Isaac Lab launcher script
Alternative locations:
- Inside
source/directory (e.g.,source/rsl_rl/) - Standalone location outside IsaacLab
- Any custom location (adjust paths accordingly)
# 1. Navigate to your IsaacLab installation
cd /path/to/IsaacLab
# 2. Uninstall the pre-installed rsl_rl from Isaac Lab
./isaaclab.sh -p -m pip uninstall rsl-rl-lib -y
# 3. Remove any cached rsl_rl directory (if exists)
rm -rf _isaac_sim/kit/python/lib/python3.10/site-packages/rsl_rl
# 4. Clone or place this repository
# Option A: Clone at IsaacLab root level (recommended)
git clone https://github.com/leggedrobotics/sru-navigation-learning.git rsl_rl
# Option B: Clone in source/ directory
cd source
git clone https://github.com/leggedrobotics/sru-navigation-learning.git rsl_rl
cd ..
# 5. Install this SRU-enhanced version in editable mode
cd rsl_rl # Adjust path if you placed it elsewhere
../isaaclab.sh -p -m pip install -e .
# 6. Verify installation
../isaaclab.sh -p -c "from rsl_rl.modules import ActorCriticSRU; print('✓ SRU modules loaded')"Important Notes:
- The package installs as
rsl_rl(notrsl_rl_lib) to maintain compatibility with Isaac Lab imports - This repository directory can have any name (e.g.,
sru-navigation-learning,rsl_rl, etc.), but the installed package name will always bersl_rl - The editable install (
-e) allows you to modify the code without reinstalling
Dependencies:
- PyTorch (GPU-accelerated training recommended)
- NumPy
- Optional: tensorboard, wandb, neptune (for logging)
Note: This package provides only the RL training framework. To train navigation policies, you also need:
- A compatible simulation environment (e.g., sru-navigation-sim)
- A pretrained depth encoder (see project website)
Training hyperparameters are specified in YAML configuration files. See config/dummy_config.yaml for an example PPO configuration.
Key parameters:
policy: Network architecture configuration (SRU layers, attention heads, hidden dimensions)algorithm: PPO/MDPO hyperparameters (learning rate, entropy coefficient, clip range)runner: Training settings (number of steps per rollout, max iterations)
The typical training workflow with a compatible environment:
from rsl_rl.runners import OnPolicyRunner
from your_env import YourNavigationEnv # From simulation package
# Initialize environment
env = YourNavigationEnv(num_envs=4096)
# Create runner with configuration
runner = OnPolicyRunner(env, config, device='cuda:0')
# Train
runner.learn(num_learning_iterations=10000)The framework supports multiple logging backends configured through the logger parameter:
- Tensorboard: https://www.tensorflow.org/tensorboard/
- Weights & Biases: https://wandb.ai/site
- Neptune: https://docs.neptune.ai/
Export trained policies for deployment:
# Load trained model
policy = ActorCriticSRU(...)
policy.load_state_dict(torch.load("checkpoint.pt"))
# Export to JIT (PyTorch deployment)
policy.export_jit(path="./exported", filename="policy.pt", normalizer=obs_normalizer)
# Export to ONNX (C++, ROS, TensorRT deployment)
policy.export_onnx(path="./exported", filename="policy.onnx", normalizer=obs_normalizer)ONNX Export Notes for Recurrent Models:
- Hidden states are exposed as explicit inputs/outputs (
h_in,c_in→h_out,c_out) - Initialize hidden states to zeros at episode start
- Pass updated hidden states back as inputs for the next timestep
-
Spatial Transformation Gates: Element-wise multiplication operations enabling implicit spatial memory from egocentric observations
-
Multi-Camera Fusion: Native support for multiple depth cameras with proper padding, masking, and attention mechanisms
-
Temporally Consistent Dropout: Dropout masks maintained across trajectory timesteps for stable training
-
Efficient Attention: Batched self-attention and cross-attention computed outside RNN loop for computational efficiency
-
Deep Mutual Learning: MDPO algorithm with KL-divergence distillation between dual networks
-
MUON Optimizer Integration: Momentum Orthogonalized by Newton-schulz optimizer for hidden weight layers
- Uses Newton-Schulz iteration for efficient orthogonalization in bfloat16
- Reduces memory usage compared to Adam optimizer
- Provides more stable training dynamics for deep networks
- Automatically separates hidden weights (optimized with MUON) from biases/gains (optimized with AdamW)
- 23.5% improvement over standard RNNs (LSTM/GRU)
- 29.6% advantage vs. explicit mapping approaches
- 105% better than stacked-frame baselines
- 2.5x improvement in challenging stair environments
- Zero-shot sim-to-real transfer with no fine-tuning
| Repository | Description |
|---|---|
| sru-pytorch-spatial-learning | Core SRU PyTorch module (standalone) |
| sru-navigation-sim | IsaacLab simulation environments |
| sru-depth-pretraining | Self-supervised depth encoder pretraining |
| sru-robot-deployment | Real robot deployment (ROS2, Gazebo) |
If you use this code in your research, please cite:
@article{yang2025sru,
author = {Yang, Fan and Frivik, Per and Hoeller, David and Wang, Chen and Cadena, Cesar and Hutter, Marco},
title = {Spatially-enhanced recurrent memory for long-range mapless navigation via end-to-end reinforcement learning},
journal = {The International Journal of Robotics Research},
year = {2025},
doi = {10.1177/02783649251401926},
url = {https://doi.org/10.1177/02783649251401926}
}For documentation, we adopt the Google Style Guide for docstrings.
We use the following tools for maintaining code quality:
- pre-commit: Runs formatters and linters over the codebase
- black: Code formatter
- flake8: Style checker
To set up pre-commit hooks:
# Installation (one time)
pre-commit install
# Run on all files
pre-commit run --all-filesThis repository is built upon rsl_rl from ETH Zurich Robotic Systems Lab and NVIDIA.
Original rsl_rl Maintainers: David Hoeller and Nikita Rudin SRU Extension: Fan Yang Affiliation: Robotic Systems Lab, ETH Zurich Contact: [email protected]
This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.
See licenses/ directory for dependency licenses.