Learning latent plans from play

This repo is an implementation the self-supervised policy learning methods Play-GCBC and Play-LMP (as described in the paper by Lynch et al. [1]), including a script to train policies on play datasets from the CALVIN benchmark [2].

The GIF below shows a trained Play-LMP policy executing a series of tasks in CALVIN environment D. By passing different goal states as inputs, a single policy is able to solve multiple tasks: grasp and displace blocks, open and close the drawer/cabinet, push the button and turn on the light by flipping the switch. We also observe emergent "retrying" behavior, where the policy will re-attempt a task if it fails in its first try (in the GIF, for example, when grasping blocks and when trying to open the drawer).

The implementation uses state-based observations and goals (as opposed to images in the original paper), so there is no CNN backbone.

Installation and training

First, clone the repo:

git clone https://github.com/jongoiko/play-lmp.git
cd play-lmp

Then install calvin_env and its dependencies:

git clone --recursive https://github.com/mees/calvin_env.git
cd calvin_env
touch __init__.py
uv pip install -e .
cd ..

We now need to download the training dataset, e.g. for the CALVIN environment D (note that this split is 166 GB, so it will take some time to download):

git clone https://github.com/mees/calvin.git
cd calvin/dataset
sh download_data.sh D
cd ../..

We can now start training a policy. All hyperparameters are stored in conf/config.yaml and read using Hydra.

uv run train.py

Metrics will be logged to TensorBoard.

Running a trained policy

Once a policy has been trained, we can execute it on the CALVIN environment to solve a variety of tasks: see the run_policy.ipynb notebook, which can be run on JupyterLab with

uv run --with jupyter --with matplotlib --with mediapy jupyter lab

Since in this implementation the policy uses state-based observations, we would not expect to see generalization to different environments (e.g. A, B, C $\rightarrow$ D, as proposed in the CALVIN paper). However, it would be interesting to test this generalization when training the policy on image-based observations.

Resources and references

[1] Lynch, Corey, et al. "Learning latent plans from play." Conference on robot learning. PMLR, 2020.

[2] Mees, Oier, et al. "CALVIN: A benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks." IEEE Robotics and Automation Letters 7.3 (2022): 7327-7334.

Giulio Starace's post explaining the Discretized Logistic Mixture Likelihood, used for parameterizing the policy's action distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
conf		conf
media		media
play_lmp		play_lmp
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_policy.ipynb		run_policy.ipynb
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning latent plans from play

Installation and training

Running a trained policy

Resources and references

About

Uh oh!

Releases

Packages

Languages

License

jongoiko/play-lmp

Folders and files

Latest commit

History

Repository files navigation

Learning latent plans from play

Installation and training

Running a trained policy

Resources and references

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages