[RLlib] Cleanup examples folder #01.#44067
Conversation
…nup_examples_folder
…_on_new_api_stack_w_env_runner_and_connectorv2 Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/algorithms/algorithm.py # rllib/utils/actor_manager.py
…f some bug in the rlmodule specs. Signed-off-by: sven1977 <svenmika1977@gmail.com>
…nup_examples_folder
…nup_examples_folder
…nup_examples_folder
simonsays1980
left a comment
There was a problem hiding this comment.
LGTM. Some more infos would be helpful here and there to give a user/developer the big picture and the why. Awesome example updates!
| --test-env RAY_USE_MULTIPROCESSING_CPU_COUNT=1 | ||
| depends_on: rllibbuild | ||
|
|
||
| - label: ":brain: rllib: data tests" |
There was a problem hiding this comment.
What's actually the meaning of brain here? Learning tests?
| module_specs=( | ||
| self.rl_module_spec.module_specs | ||
| if isinstance(self.rl_module_spec, MultiAgentRLModuleSpec) | ||
| else set(self.policies) |
There was a problem hiding this comment.
I hope we can soon get rid of the policy/ies naming. This is still confusing in the module setups.
There was a problem hiding this comment.
Great point. We need to unify this soon and fully adapt to the new stack terminology. Some ideas:
- Have user explicitly enable multi-agent (otherwise, will error if multi-agent components are used).
config.multi_agent(policies)should no longer be necessary (already kind of replaced byconfig.rl_module)- policy_mapping_fn -> agent_to_module_mapping_fn
- etc..
| ), | ||
| data, | ||
| ) | ||
| for column, column_data in data.copy().items(): |
There was a problem hiding this comment.
Can we make for each connector either a doc string that tells us the input and output shape?
We could also create something like the RLModules have: get_input_specs, get_output_specs and then check modules (also user modules) in the pipeline if they "fit".
There was a problem hiding this comment.
Also let's add a comment about the bigger picture here: what do the recurrent modules expect and what do we feed them.
| # Create our connector piece. | ||
| connector = AgentToModuleMapping( | ||
| modules=["module0", "module1"], | ||
| module_specs={"module0", "module1"}, |
There was a problem hiding this comment.
Shouldn't this be: {"module0": SingleAgentModuleSpec(....), "module1": ...}?
There was a problem hiding this comment.
Both is possible. Sometimes, users don't specify the individual SingleAgentRLModuleSpec (RLlib then uses the algo's default ones), thus they also do NOT provide space/class/config information for individual modules. The connector needs to be ok with that and fall back to only having the IDs of the modules w/o any further information.
| @@ -615,6 +615,7 @@ def foreach_batch_item_change_in_place( | |||
| func: Callable[[Any, int, AgentID, ModuleID], Any], | |||
| ) -> None: | |||
There was a problem hiding this comment.
Can we add a doc string that explains when to use it and how?
There was a problem hiding this comment.
Good catch. You are right, this one is completely missing a docstring :|
There was a problem hiding this comment.
Added docstring and thorough .. testcode::.
| We define a custom evaluation method that does the following: | ||
| - It changes the corridor length of all environments used on the evaluation EnvRunners. | ||
| - It runs a defined number of episodes for evaluation purposes. | ||
| - It collects the metrics from those runs, summarizes these metrics and return them. |
There was a problem hiding this comment.
Tiny typo :) "return" -> "returns"
| func=lambda worker: (worker.sample(), worker.get_metrics())[1], | ||
| local_worker=False, | ||
| ) | ||
| for metrics_per_worker in metrics_all_workers: |
There was a problem hiding this comment.
Can we show here maybe how to sort the metrics from different corridor lengths in the results dict? (e.g. such that they show up in different diagrams in wandb/tensorboard)
| See: https://pettingzoo.farama.org/environments/sisl/waterworld/ | ||
| for more details on the environment. | ||
|
|
||
| Note that this example is different from the old API stack scripts: |
There was a problem hiding this comment.
Ah here they are. Awesome!
| `examples/centralized_critic.py` and `examples/centralized_critic_2.py` in the | ||
| sense that here, a true shared value function is used via the new | ||
| `MultiAgentRLModule` class as opposed to both of the old API stack scripts, which | ||
| do NOT use a single central value function, but 2: One for each policy learnt. |
There was a problem hiding this comment.
Typo: "learnt" -> "learned"
| .multi_agent( | ||
| policies=policies, | ||
| # Exact 1:1 mapping from AgentID to ModuleID. | ||
| policy_mapping_fn=(lambda aid, *args, **kwargs: aid), |
There was a problem hiding this comment.
Doesn't this mean that they all share in addition the policy?
There was a problem hiding this comment.
Ah, sorry, this example is NOT done yet. I'll finish it as discussed above.
Good call! :)
angelinalg
left a comment
There was a problem hiding this comment.
Releasing first batch of comments to review a higher priority PR first.
rllib/examples/_old_api_stack/remote_envs_with_inference_done_on_main_node.py
Outdated
Show resolved
Hide resolved
rllib/examples/_old_api_stack/remote_envs_with_inference_done_on_main_node.py
Outdated
Show resolved
Hide resolved
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
angelinalg
left a comment
There was a problem hiding this comment.
I think I reviewed more than you intended. Sorry for the delay. Hope this helps.
|
|
||
| ray.init(local_mode=args.local_mode) | ||
|
|
||
| # Simple environment with 4 independent cartpole entities |
There was a problem hiding this comment.
| # Simple environment with 4 independent cartpole entities | |
| # Simple environment with 4 independent cartpole entities. |
| """Example of customizing the evaluation procedure for an RLlib algorithm. | ||
|
|
||
| Note, that you should only choose to provide a custom eval function, in case the already | ||
| built-in eval options are not sufficient. Normally, though, RLlib's eval utilities |
There was a problem hiding this comment.
| built-in eval options are not sufficient. Normally, though, RLlib's eval utilities | |
| built-in eval options aren't sufficient. Normally, though, RLlib's eval utilities |
| This script uses the SimpleCorridor environment, a simple 1D gridworld, in which | ||
| the agent can only walk left (action=0) or right (action=1). The goal is at the end of | ||
| the (1D) corridor. The env exposes an API to change the length of the corridor | ||
| on-the-fly. We use this API here to extend the size of the corridor for the evaluation |
There was a problem hiding this comment.
| on-the-fly. We use this API here to extend the size of the corridor for the evaluation | |
| on-the-fly. This API extends the size of the corridor for the evaluation |
| on-the-fly. We use this API here to extend the size of the corridor for the evaluation | ||
| runs. | ||
|
|
||
| We define a custom evaluation method that does the following: |
There was a problem hiding this comment.
| We define a custom evaluation method that does the following: | |
| A custom evaluation method does the following: |
| runs. | ||
|
|
||
| We define a custom evaluation method that does the following: | ||
| - It changes the corridor length of all environments used on the evaluation EnvRunners. |
There was a problem hiding this comment.
| - It changes the corridor length of all environments used on the evaluation EnvRunners. | |
| - It changes the corridor length of all environments RLlib uses on the evaluation EnvRunners. |
rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py
Outdated
Show resolved
Hide resolved
rllib/examples/multi_agent_and_self_play/two_step_game_with_grouped_agents.py
Outdated
Show resolved
Hide resolved
|
|
||
| For debugging, use the following additional command line options | ||
| `--no-tune --num-env-runners=0` | ||
| Which should allow you to set breakpoints anywhere in the RLlib code and |
There was a problem hiding this comment.
| Which should allow you to set breakpoints anywhere in the RLlib code and | |
| which should allow you to set breakpoints anywhere in the RLlib code and |
rllib/examples/rl_module/classes/rock_paper_scissors_heuristic_rlm.py
Outdated
Show resolved
Hide resolved
rllib/examples/rl_module/classes/rock_paper_scissors_heuristic_rlm.py
Outdated
Show resolved
Hide resolved
…nup_examples_folder
…eanup_examples_folder
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Sven Mika <sven@anyscale.io>
…eanup_examples_folder
…eanup_examples_folder

Cleanup examples folder no. 01:
evaluationmulti_agent_and_self_playgpu_trainingTODO (in follow-up PRs):
examples, e.g.examples/rl_module/classes/orexamples/env/classesand leave the space in the direct sub-directories for entire scripts only (i.e. insideexamples/env, there should be all the example scripts demo'ing env/env-runner/custom env stuff; only within the sub-dirclassesshould all the example envs go; same with models, rl_module, learner, etc..).Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.