We provide examples to fine-tune Octo, on the top of HIL-SERL that provides the base environment to perform robotic manipulation tasks with human interventions. The following sections describe how to use our code.
Table of Contents
-
Setup Conda Environment: create an environment with
conda create -n conrft python=3.10
-
Install Jax as follows:
-
For CPU (not recommended):
pip install --upgrade "jax[cpu]"
-
For GPU:
pip install --upgrade "jax[cuda11_pip]==0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
-
See the Jax Github page for more details on installing Jax.
-
-
Install the Octo
git clone [email protected]:cccedric/octo.git cd octo pip install -e . pip install -r requirements.txt
Note: This is a personalized fork of Octo, adding custom functions while preserving its core capabilities for general-purpose robotic manipulation.
-
Install the serl_launcher
cd serl_launcher pip install -e . pip install -r requirements.txt
-
Install for serl_robot_infra
Please refer to the README in the
serl_robot_infra
directory for installation instructions and details on operating the Franka robot arm. This document includes guidance on setting up the impedance-based serl_franka_controllers. After completing the installation, you should be able to start the robot server and interact with thefranka_env
gym for hardware control.
We offers a set of code for fine-tuning Octo in robotic manipulation tasks. The approach's pipeline consists of an actor thread and a learner thread, both of which interact with the robot gym environment. These two threads operate asynchronously, with data transmitted from the actor to the learner node over the network using agentlace. The learner thread periodically updates the policy and syncs it with the actor.
Table for code structure
Code Directory | Description |
---|---|
examples | Scripts for policy training, demonstration data collection, reward classifier training |
serl_launcher | Main code for Agent Training |
serl_launcher.agents | Agent Policies (e.g. SAC, BC) |
serl_launcher.wrappers | Gym env wrappers |
serl_launcher.data | Replay buffer and data store |
serl_launcher.vision | Vision related models and utils |
serl_robot_infra | Robot infra for running with real robots |
serl_robot_infra.robot_servers | Flask server for sending commands to robot via ROS |
serl_robot_infra.franka_env | Gym env for Franka robot |
We provide a step-by-step guide in franka_walkthrough to fine-tune VLA with ConRFT on a Franka robot.
For any questions, please feel free to email [email protected].
Our code is built upon CPQL, Octo, HIL-SERL. We thank all these authors for their nicely open sourced code and their great contributions to the community.
If you find our research helpful and would like to reference it in your work, please consider the following citations:
@article{chen2025conrft,
title={ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy},
author={Chen, Yuhui and Tian, Shuai and Liu, Shugao and Zhou, Yingting and Li, Haoran and Zhao, Dongbin},
journal={arXiv preprint arXiv:2502.05450},
year={2025}
}