Skip to content

[RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option.#43787

Merged
sven1977 merged 57 commits intoray-project:masterfrom
sven1977:evaluation_do_over_make_parallel_default_deprecate_async
Mar 13, 2024
Merged

[RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option.#43787
sven1977 merged 57 commits intoray-project:masterfrom
sven1977:evaluation_do_over_make_parallel_default_deprecate_async

Conversation

@sven1977
Copy link
Contributor

@sven1977 sven1977 commented Mar 7, 2024

Evaluation do-over:
This PR aims at simplifying our evaluation code a little bit.

  • It deprecates the enable_async_evaluation option entirely. This should be replaced by a combination of existing options, evaluation_parallel_to_training=True AND evaluation_duration="auto" AND evaluation_interval=1 (AND evaluation_force_reset_envs_before_iteration=True, which is the default any way). These settings combined should be enough to 100% replace the old async behavior in the sense:
    • that there is always an eval job running under the hood (parallel to training).
    • the eval track is fault tolerant, restarting/ignoring worker failures as configured in config.fault_tolerance()
    • the eval track does NOT report episode metrics from episodes that were run with an outdated weight set.
    • no eval worker is ever idle, even if it run through a very short episode (it'll start right away with the next one and include all finished episodes - as always - in the resorted metrics).

The PR in particular:

  • Splits up the self.evaluation method into various sub-methods, depending on the logic configured: _evaluate_with_auto_duration, _evaluate_with_fixed_duration, _evaluate_on_local_worker, _evaluate_with_custom_function.
    • Some of these methods require dissemination between old stack (RolloutWorkers return batches) and new API stack (EnvRunners return lists of episodes), however, these additional if-else blocks will go away once we have moved entirely to the new stack.
  • Speeds up the parallel evaluation step with auto duration by: Computing a better rollout_fragment_length dynamically depending on the estimated time it takes for the parallel training step to finish. This used to be a fixed 10 timesteps, which was causing lots of (expensive) remote calls on the eval workers.
  • Deprecates the async eval option (see above for an analysis on why this should NOT lead to diminished functionality).

TODOs for future PRs:

  • Unify reported metrics (this is currently quite a mess). This PR does NOT change the current behavior.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

sven1977 added 5 commits March 7, 2024 14:43
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 added the do-not-merge Do not merge this PR! label Mar 7, 2024
@sven1977 sven1977 changed the title [RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option. [WIP; RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option. Mar 7, 2024
sven1977 added 4 commits March 8, 2024 12:41
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
**kwargs,
)

# Check, whether `training_iteration` is still a tune.Trainable property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed, imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end it is in the responsibility of the user to override a Trainable method or not. Erroring out is too much of a consequence, agreed. We could leave it as a warning, but better imo is to refer more explicitly in the documentation to the tune.trainable.Trainable as the base class - so, if a user wants to use tune she should not override such methods.

# self.iteration will be 0.
evaluate_this_iter = (
self.config.evaluation_interval is not None
self.config.evaluation_interval
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug, both in code and docs. Can also be 0 to cause NO evaluation.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 requested a review from a team as a code owner March 8, 2024 14:47
@sven1977 sven1977 changed the title [WIP; RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option. [RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option. Mar 8, 2024
@sven1977 sven1977 added rllib RLlib related issues rllib-evaluation Bug affecting policy evaluation with RLlib. rllib-newstack and removed do-not-merge Do not merge this PR! labels Mar 8, 2024
@@ -1,168 +1,6 @@
import argparse
import os
msg = """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight examples folder cleanup. Will have to do more of these :)

@@ -0,0 +1,157 @@
from ray.rllib.algorithms.callbacks import DefaultCallbacks
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same script as before, just moved here and cleaned up a little.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
sven1977 added 27 commits March 11, 2024 16:15
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…_default_deprecate_async' into evaluation_do_over_make_parallel_default_deprecate_async
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 4b47995 into ray-project:master Mar 13, 2024
@sven1977 sven1977 deleted the evaluation_do_over_make_parallel_default_deprecate_async branch March 13, 2024 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rllib RLlib related issues rllib-evaluation Bug affecting policy evaluation with RLlib.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants