[RLlib] Hot fix for PPOTorchRLModule._compute_values with non-shared stateful encoder and batch slicing with non-empty infos.#44082
Merged
sven1977 merged 1 commit intoray-project:masterfrom Mar 18, 2024
Conversation
…ompute_values' when using a non-shared stateful encoder. In addition fixed an error that occurs while slicing batches with non-empty infos. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
PPOTorchRLModule._compute_values with non-shared stateful encoder and batch slicing with non-empty infosPPOTorchRLModule._compute_values with non-shared stateful encoder and batch slicing with non-empty infos.
sven1977
reviewed
Mar 18, 2024
| # Exclude INFOs from regular array slicing as the data under this column might | ||
| # be a list (not good for `tree.map_structure` call). | ||
| infos = self.get(SampleBatch.INFOS) | ||
| # Furthermore, slicing does not work when the data in the column is |
Contributor
There was a problem hiding this comment.
You mean a SampleBatch with B=0, correct?
Contributor
Author
There was a problem hiding this comment.
B>0. But in this case the infos are a list of dicts. When they are empty, tree.map_structure(infos) works, but when they are filled, tree.map_structure will fail as it tries to apply the slicing on singular values slicing fails.
sven1977
approved these changes
Mar 18, 2024
Contributor
sven1977
left a comment
There was a problem hiding this comment.
Looks good, thanks for the fix @simonsays1980!
sven1977
reviewed
Mar 18, 2024
|
|
||
| # Separate vf-encoder. | ||
| if hasattr(self.encoder, "critic_encoder"): | ||
| if self.is_stateful(): |
Contributor
There was a problem hiding this comment.
Awesome! Ran into this issue yesterday as well (and continued testing then with a shared value function :) ).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Running
PPOwithuse_lstm=Trueandvf_share_layers=Falseresults in an error in thePPOTorchRLModule._compute_valuesmethod as the specs checker expects a different spec for thestate_in:Exctracting the
state_infor thecriticsolves this problem.Another problem is solved related to non-empty infos in batch slicing (mainly occuring in
MinibatchIterators). The reason is that slicing viatree.map_structuretries to slice also the entries of theinfos which are usually singular values:Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.