This repo is an implementation the self-supervised policy learning methods Play-GCBC and Play-LMP (as described in the paper by Lynch et al. [1]), including a script to train policies on play datasets from the CALVIN benchmark [2].
The GIF below shows a trained Play-LMP policy executing a series of tasks in CALVIN environment D. By passing different goal states as inputs, a single policy is able to solve multiple tasks: grasp and displace blocks, open and close the drawer/cabinet, push the button and turn on the light by flipping the switch. We also observe emergent "retrying" behavior, where the policy will re-attempt a task if it fails in its first try (in the GIF, for example, when grasping blocks and when trying to open the drawer).
The implementation uses state-based observations and goals (as opposed to images in the original paper), so there is no CNN backbone.
First, clone the repo:
git clone https://github.com/jongoiko/play-lmp.git
cd play-lmpThen install calvin_env and its dependencies:
git clone --recursive https://github.com/mees/calvin_env.git
cd calvin_env
touch __init__.py
uv pip install -e .
cd ..We now need to download the training dataset, e.g. for the CALVIN environment D (note that this split is 166 GB, so it will take some time to download):
git clone https://github.com/mees/calvin.git
cd calvin/dataset
sh download_data.sh D
cd ../..We can now start training a policy.
All hyperparameters are stored in conf/config.yaml and read using Hydra.
uv run train.pyMetrics will be logged to TensorBoard.
Once a policy has been trained, we can execute it on the CALVIN environment to solve a variety of tasks:
see the run_policy.ipynb notebook, which can be run on JupyterLab with
uv run --with jupyter --with matplotlib --with mediapy jupyter labSince in this implementation the policy uses state-based observations, we would not expect to see generalization to different environments (e.g. A, B, C
[1] Lynch, Corey, et al. "Learning latent plans from play." Conference on robot learning. PMLR, 2020.
[2] Mees, Oier, et al. "CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks." IEEE Robotics and Automation Letters 7.3 (2022): 7327-7334.
Giulio Starace's post explaining the Discretized Logistic Mixture Likelihood, used for parameterizing the policy's action distribution.
