[RLlib] Add example: Pre-train an RLModule single-agent, then bring checkpoint into multi-agent setup and continue training.#44674
Merged
sven1977 merged 41 commits intoray-project:masterfrom Apr 16, 2024
Conversation
…define the model config for 'RLmodule' in a unified way without interfering with the old stack. Reconfigured DQN Rainbow with it. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…d examples accordingly. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ordingly. In addition, fixed some typos. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…g_dict' in 'AlgorithmConfig.rl_module' as they were failing. Something is still wrong with the VisionNet in 'connector_v2_frame_stacking' example. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…emains b/c low priority. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…rl_module_api' needed a 'False' for error - so only wanring. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…nstead of model_config. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…odule()'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ot using the corresponding default model configuration of the training algorithm. Also added a pre-training example for MARL. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…in single module and load its checkpoint into a MARL setting for one policy. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
RLModule pre-training example for multi-agent setupRLModule pre-training example for multi-agent setup.
… external module did not use the default model config of the algorithm. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…l-config-for-new-api-stack
sven1977
reviewed
Apr 15, 2024
sven1977
reviewed
Apr 15, 2024
sven1977
reviewed
Apr 15, 2024
sven1977
reviewed
Apr 15, 2024
| config = ( | ||
| PPOConfig() | ||
| # Enable the new API stack (RLModule and Learner APIs). | ||
| .experimental(_enable_new_api_stack=True) |
Contributor
There was a problem hiding this comment.
This is done automatically by the run_rllib_example_script_experiment util.
sven1977
reviewed
Apr 15, 2024
sven1977
reviewed
Apr 15, 2024
| marl_module_spec = MultiAgentRLModuleSpec(module_specs=module_specs) | ||
|
|
||
| # Register our environment with tune if we use multiple agents. | ||
| if args.num_agents > 0: |
Contributor
There was a problem hiding this comment.
Is this if-block needed? We assert that this command line arg is >0 above.
Contributor
Author
There was a problem hiding this comment.
Yeah I guess we can remove this here. Good catch @sven1977 !
Contributor
Author
There was a problem hiding this comment.
Great catch! I removed this in the follow-up commit.
sven1977
reviewed
Apr 15, 2024
Signed-off-by: Sven Mika <sven@anyscale.io>
sven1977
approved these changes
Apr 15, 2024
Contributor
sven1977
left a comment
There was a problem hiding this comment.
Super nice example and PR! Thanks @simonsays1980 !
Just a few nits and waiting for:
- We must add this great example to the BUILD!
- Can we rename the script into a more descriptive name? Like
pretraining_single_agent_training_multi_agent<- something like this that more describes the exact sequence of things we do here.
…agents. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Contributor
|
Ok, cool! Can we also add this example script to BUILD? |
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
sven1977
reviewed
Apr 16, 2024
| srcs = ["examples/rl_modules/classes/mobilenet_rlm.py"], | ||
| ) | ||
|
|
||
| py_test( |
sven1977
reviewed
Apr 16, 2024
Signed-off-by: Sven Mika <sven@anyscale.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
So far, we have no example that shows users how to pre-train certain policies and load the checkpoints.
This PR shows users how to pre-train a module in single-agent mode and load its checkpoint in another training run into a MARL setup.
Related issue number
Related to #44263
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.