[RLlib] PPO runs with EnvRunner w/o old Policy API (also solves KL issues with PPORLModules).#39732
Merged
sven1977 merged 57 commits intoray-project:masterfrom Oct 25, 2023
Conversation
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…_weigths, get_weigths from EnvRunner such that EnvRunner can be used equivalently to RolloutWorker in Algorithm. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…w. _Episode needs some fixing for some algorithms as they need extra keys in the sample batch. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
simonsays1980
commented
Sep 18, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
sven1977
reviewed
Sep 20, 2023
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…get infos as lists. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…essing for episodes instead of SampleBatches. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
… Training works now. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…o 'MultiAgentRLModule'. Created cases for 'num_remote_workers() <=0', kept logic for 'ModelV2' logic. Removed global vars for 'Learner API' in 'PPO' training step. Test 'ppo_with_rl_module()' runs. Had to remove for this 'check_compute_single_action_from_input_dict()' as 'EnvRunner' has no policy. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…o PPO from SingleAgentEnvRunner. Implemented logic for torch and tf2. Test runs. Tuned example not yet. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…ey error in postprocessing. Furthermore, modified weight synching to synch not too often. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…om/simonsays1980/ray into solve-kl-issues-with-ppo-rl-module
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-kl-issues-with-ppo-rl-module
sven1977
approved these changes
Oct 24, 2023
Contributor
sven1977
left a comment
There was a problem hiding this comment.
Awesome work @simonsays1980!
Thanks for thie great PR. Should be all easily rolling downhill from here on :)
Contributor
Author
|
Happy for the contribution. I am excited about the new sampling API and how it will improve learning performance and user experience. Thanks for the great input @sven1977, @ArturNiederfahrenhorst and @kouroshHakha |
…e-kl-issues-with-ppo-rl-module
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
By time sampling should be performed by the
env.EnvRunnerclass, individually for different algorithms. In the same breath the policy should become obsolete.Following the example of
DreamerV3this draft PR should develop a way to implement these changes intoPPO.Related issue number
Closes #39174 #39813
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.