ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

We provide examples to fine-tune Octo, on the top of HIL-SERL that provides the base environment to perform robotic manipulation tasks with human interventions. The following sections describe how to use our code.

Table of Contents

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

🛠️ Installation Instructions

Setup Conda Environment: create an environment with
```
conda create -n conrft python=3.10
```

Install Jax as follows:

For CPU (not recommended):
```
pip install --upgrade "jax[cpu]"
```

For GPU:

pip install --upgrade "jax[cuda11_pip]==0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

See the Jax Github page for more details on installing Jax.

Install the Octo
```
git clone [email protected]:cccedric/octo.git
cd octo
pip install -e .
pip install -r requirements.txt
```
Note: This is a personalized fork of Octo, adding custom functions while preserving its core capabilities for general-purpose robotic manipulation.

Install the serl_launcher

cd serl_launcher
pip install -e .
pip install -r requirements.txt

Install for serl_robot_infra

Please refer to the README in the serl_robot_infra directory for installation instructions and details on operating the Franka robot arm. This document includes guidance on setting up the impedance-based serl_franka_controllers. After completing the installation, you should be able to start the robot server and interact with the franka_env gym for hardware control.

💻 Overview and Code Structure

We offers a set of code for fine-tuning Octo in robotic manipulation tasks. The approach's pipeline consists of an actor thread and a learner thread, both of which interact with the robot gym environment. These two threads operate asynchronously, with data transmitted from the actor to the learner node over the network using agentlace. The learner thread periodically updates the policy and syncs it with the actor.

Table for code structure

Code Directory	Description
examples	Scripts for policy training, demonstration data collection, reward classifier training
serl_launcher	Main code for Agent Training
serl_launcher.agents	Agent Policies (e.g. SAC, BC)
serl_launcher.wrappers	Gym env wrappers
serl_launcher.data	Replay buffer and data store
serl_launcher.vision	Vision related models and utils
serl_robot_infra	Robot infra for running with real robots
serl_robot_infra.robot_servers	Flask server for sending commands to robot via ROS
serl_robot_infra.franka_env	Gym env for Franka robot

We provide a step-by-step guide in franka_walkthrough to fine-tune VLA with ConRFT on a Franka robot.

✉️ Contact

For any questions, please feel free to email [email protected].

🙏 Acknowledgement

Our code is built upon CPQL, Octo, HIL-SERL. We thank all these authors for their nicely open sourced code and their great contributions to the community.

📝 Citation

If you find our research helpful and would like to reference it in your work, please consider the following citations:

@article{chen2025conrft,
  title={ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy},
  author={Chen, Yuhui and Tian, Shuai and Liu, Shugao and Zhou, Yingting and Li, Haoran and Zhao, Dongbin},
  journal={arXiv preprint arXiv:2502.05450},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
docs		docs
examples		examples
serl_launcher		serl_launcher
serl_robot_infra		serl_robot_infra
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

🛠️ Installation Instructions

💻 Overview and Code Structure

✉️ Contact

🙏 Acknowledgement

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

cccedric/conrft

Folders and files

Latest commit

History

Repository files navigation

ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy

🛠️ Installation Instructions

💻 Overview and Code Structure

✉️ Contact

🙏 Acknowledgement

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages