[RLlib] Added functionality to add infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.#43496
Conversation
…atch when sampling from 'PrioritizedEpisodeReplayBuffer'. Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
infos and'extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.infos andextra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.
infos andextra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.
…e-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…e-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.
rllib/utils/replay_buffers/tests/test_prioritized_episode_replay_buffer.py
Outdated
Show resolved
Hide resolved
| if include_extra_model_outputs: | ||
| ret.update( | ||
| { | ||
| "extra_model_outputs": np.array(extra_model_outputs), |
There was a problem hiding this comment.
Not sure this is a good idea just np'ing stuff like this. This often leads to these unwieldy object arrays that have unpredictable behavior (the same is true for np'ing the infos above, we should just keep them as a list of infos-dicts in the returned batch).
We usually separate these sub-columns in extra_model_outputs in our batches. Can we do that here, too?
ret.update(
{
k: batch(v)
for k, v in extra_model_outputs.items()
}
)
The final batch (returned from sample) should have columns at the top level, e.g. OBS or ACTION_DIST_INPUTS.
Under each of these columns should be a (possibly nested) struct of numpy array leafs (or simply a numpy array if no complex space/struct). All leafs should have the shape (B, T?, ...), where T might be 0 or 1.
Let me know, if I'm making a thinking-mistake here. :)
There was a problem hiding this comment.
Hey @sven1977 thanks for the review! Yes this was somehow still ambiguous how to deal with the extra model outputs. I can batch the items from this field such that each of the keys in extra_model_outputs defines a new column in the batch.
There was a problem hiding this comment.
@sven1977 following your logic above it might also make sense to keep the other "batch" columns here as lists such that they can be batched in a standard way in the connectors?
Signed-off-by: Sven Mika <sven@anyscale.io>
… {(eps_id,): [1.3, 4.23 ...], ...}, ...}. Furthermore, implemented a tracker for the maximum tree index to sum weights during sampling faster. Implemented testing for 'sample_with_keys'. Naming was chosen such that we can deprecate the old 'sample' as soon as initial review is done.
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
…er' of github.com:simonsays1980/ray into extra-model-outputs-for-prioritized-episode-replay-buffer Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
sven1977
left a comment
There was a problem hiding this comment.
LGTM! Let's merge this, then time-it, whether the saved time to create the batch in the buffer is eaten up by the additional batching step required in the Learner Connector (I don't think that would be the case).
Awesome PR @simonsays1980 ! :)
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
Signed-off-by: Simon Zehnder <simon.zehnder@gmail.com>
infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.infos and extra_model_outputs to the sample output of PrioritizedEpisodeReplayBuffer.
Why are these changes needed?
So far
PrioritizedEpisodeReplayBufferhad a functionality to addinfosto the sample of this buffer, but not one to add alsoextra_model_outputs. This PR adds the functionality together with a corresponding test case.Note, the
extra_model_outputsare extracted as a dict and will be added to the batch in this form per row (similar toinfos). Later in post-processing the variables from this dicitonary can be extracted in a corresponding learner connector. Furthermore, whileinfosare extracted at the end ofn_step, theextra_model_outputsusually refer to a corresponding action which comes from the first timestep in then_steptuple. Henceforth, we take theextra_model_outputsfrom the same timestep.Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.