A reinforcement learning project demonstrating how simulated tactile feedback enhances robotic manipulation for humanoid robotics applications. This project augments the RoboMimic dataset with simulated tactile sensors to improve sample efficiency and manipulation performance.
This project showcases how tactile sensing can significantly improve manipulation tasks relevant to humanoid robotics. Using a 2-finger gripper simulation (with potential extension to more complex hands), we:
- Augment existing demonstrations with simulated tactile feedback
- Train both tactile-enhanced and standard models
- Compare performance across various manipulation challenges
- Visualize the impact of tactile sensing on decision-making
- Tactile Sensing Simulation: 3×4 taxel grid on each gripper finger with simulated capacitive response
- Multi-Modal Learning: Integration of visual, proprioceptive, and tactile information
- Sample Efficient Learning: Offline reinforcement learning using demonstration data
- Comparative Analysis: Performance metrics with and without tactile feedback
- Object Manipulation: Support for cube stacking tasks with visualization of object interactions
- macOS with M1/M2 chip
- Python 3.8+
- 32GB RAM recommended
Create a new conda environment conda create -n tactile-rl python=3.8 conda activate tactile-rl
pip install mujoco
pip install torch torchvision torchaudio
pip install numpy h5py matplotlib imageio pandas
There are known issues with MuJoCo's interactive rendering on some platforms. If you encounter rendering problems:
-
Try setting the MUJOCO_GL environment variable: export MUJOCO_GL=egl # Try 'egl', 'glfw', or 'osmesa'
-
If interactive viewing doesn't work, use the headless video generation: mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --no-render --save-video --playback-speed 0.1
-
Videos will be saved to the
../expsdirectory by default.
If robot movement appears too fast or too slow compared to object movement:
- The script will automatically analyze action ranges and suggest an appropriate scaling factor
- Try using
--control-mode positionwith a lower--action-scalevalue (typically 1.0-5.0) - For accurate timestamps, use
--playback-speed 0.002to match the simulation time step - Note that object positions are set directly from the dataset while robot movement is physics-based
python -m robomimic.scripts.download_datasets --tasks can_picking python -m robomimic.scripts.download_datasets --tasks lift python -m robomimic.scripts.download_datasets --tasks square_insertion python -m robomimic.scripts.download_datasets --tasks stack
tactile-rl/
├── data/ # Datasets and processed data
├── environments/ # Custom MuJoCo environments
│ ├── tactile_gripper.py # Gripper with tactile sensors
│ └── tactile_sensor.py # Tactile sensor implementation
├── franka_emika_panda/ # Robot and environment models
│ ├── panda.xml # Original robot model
│ ├── mjx_panda.xml # MuJoCo X compatible model
│ ├── mjx_two_cubes.xml # Model with two interactive cubes
│ ├── stack_d0_compatible.xml # Enhanced model with tactile sensors
│ └── original_stack_environment.xml # Original environment from dataset
├── models/ # Neural network architectures
│ ├── policy_network.py # Policy networks with tactile processing
│ └── tactile_encoder.py # Encoder for tactile information
├── scripts/ # Utility scripts
│ ├── augment_dataset.py # Add tactile data to demonstrations
│ ├── replay_full_robot.py # Replay demonstrations with the full robot
│ ├── visualize_tactile.py # Visualize tactile readings
│ └── extend_shadow_hand.py # Optional extension to 5-finger hand
├── training/ # Training algorithms
│ ├── bc_training.py # Behavior Cloning implementation
│ └── cql_training.py # Conservative Q-Learning implementation
├── visualization/ # Visualization tools
├── main.py # Main training script
└── evaluate.py # Evaluation script
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --playback-speed 0.002 --camera 1
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --control-mode position --action-scale 3 --playback-speed 0.002
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --control-mode velocity --action-scale 10 --playback-speed 0.002
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --model franka_emika_panda/original_stack_environment.xml
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --model franka_emika_panda/stack_d0_compatible.xml
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --model franka_emika_panda/mjx_two_cubes.xml
mjpython -m scripts.replay_full_robot --dataset ../datasets/core/stack_d0.hdf5 --demo 3 --save-video --playback-speed 0.3 --camera 1
The script will:
- Generate a video file in
../exps/replay_demo{N}.mp4 - Save tactile readings to
../exps/tactile_readings_demo{N}.npy - Create visualization of tactile readings at key frames
- Show interactive object manipulation if using the cube models
This project includes several environment options:
- original_stack_environment.xml: The original environment from the dataset - provides the most accurate replay of demonstrations
- stack_d0_compatible.xml: Enhanced environment with added tactile sensors and proper cube setup
- mjx_two_cubes.xml: Simplified environment focused on cube interaction
The tactile sensors are implemented as a 3×4 grid on each finger of the gripper. Each taxel reads:
- Normal force (pressure)
- Shear forces (x, y directions)
- Contact binary state
The sensor simulation converts MuJoCo contact data into realistic tactile readings by:
- Finding contact points within each sensor pad area
- Applying a Gaussian spatial model to distribute forces across taxels
- Adding realistic noise based on capacitive sensor characteristics
- Processing readings into tactile "images" for the learning algorithm
The model uses a multi-modal architecture:
- Visual Encoder: ResNet-based feature extractor for camera input
- Tactile Encoder: CNN-based encoder for tactile "images"
- Proprioceptive Encoder: MLP for joint positions/velocities
- Fusion Module: Cross-attention mechanism for feature integration
- Policy Network: Actor-critic architecture for control
Four RoboMimic tasks are particularly well-suited for demonstrating tactile benefits:
-
Can Picking: Grasping cylindrical objects with curved surfaces
- Tactile benefit: Surface curvature detection, slip prevention
-
Lift: Simple grasping and lifting of objects
- Tactile benefit: Grasp stability, force regulation
-
Square Insertion: Placing square pegs into matching holes
- Tactile benefit: Contact detection for alignment, force feedback for insertion
-
Stack: Stacking cubes on top of each other
- Tactile benefit: Precision grip control, contact detection for placement
Key performance metrics include:
- Success rate comparison across tasks
- Sample efficiency (learning curves)
- Robustness to variations in object properties
- Visualization of attention on tactile features during critical manipulation phases
This project is directly relevant to the Tesla Optimus program in several ways:
- Manipulation Enhancement: Addresses the challenge of precise object manipulation
- Sensor Fusion: Demonstrates effective integration of multiple sensor modalities
- Sample Efficiency: Shows improved learning with fewer demonstrations
- Safety: Tactile feedback prevents excessive force application
- Adaptability: Improves handling of diverse objects and conditions
Potential extensions to this project:
- Extension to 5-finger Shadow Hand: Implementing the same approach on a more dexterous hand model
- Implementation on a real robot with tactile sensors
- Integration with vision-language models for instruction following
- Exploration of active tactile exploration strategies
- Curriculum learning for complex manipulation sequences
- RoboMimic framework and datasets
- MuJoCo simulation environment
