[RLlib] Evaluation do-over: Make parallel evaluation to training the default behavior and deprecate async eval option.#43787
Conversation
…uation_do_over_make_parallel_default_deprecate_async
…uation_do_over_make_parallel_default_deprecate_async
…uation_do_over_make_parallel_default_deprecate_async
| **kwargs, | ||
| ) | ||
|
|
||
| # Check, whether `training_iteration` is still a tune.Trainable property |
There was a problem hiding this comment.
No longer needed, imo.
There was a problem hiding this comment.
At the end it is in the responsibility of the user to override a Trainable method or not. Erroring out is too much of a consequence, agreed. We could leave it as a warning, but better imo is to refer more explicitly in the documentation to the tune.trainable.Trainable as the base class - so, if a user wants to use tune she should not override such methods.
| # self.iteration will be 0. | ||
| evaluate_this_iter = ( | ||
| self.config.evaluation_interval is not None | ||
| self.config.evaluation_interval |
There was a problem hiding this comment.
This was a bug, both in code and docs. Can also be 0 to cause NO evaluation.
| @@ -1,168 +1,6 @@ | |||
| import argparse | |||
| import os | |||
| msg = """ | |||
There was a problem hiding this comment.
Slight examples folder cleanup. Will have to do more of these :)
| @@ -0,0 +1,157 @@ | |||
| from ray.rllib.algorithms.callbacks import DefaultCallbacks | |||
There was a problem hiding this comment.
Same script as before, just moved here and cleaned up a little.
…_default_deprecate_async' into evaluation_do_over_make_parallel_default_deprecate_async
…uation_do_over_make_parallel_default_deprecate_async
…uation_do_over_make_parallel_default_deprecate_async
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Evaluation do-over:
This PR aims at simplifying our evaluation code a little bit.
evaluation_parallel_to_training=TrueANDevaluation_duration="auto"ANDevaluation_interval=1(ANDevaluation_force_reset_envs_before_iteration=True, which is the default any way). These settings combined should be enough to 100% replace the old async behavior in the sense:config.fault_tolerance()The PR in particular:
self.evaluationmethod into various sub-methods, depending on the logic configured:_evaluate_with_auto_duration,_evaluate_with_fixed_duration,_evaluate_on_local_worker,_evaluate_with_custom_function.rollout_fragment_lengthdynamically depending on the estimated time it takes for the parallel training step to finish. This used to be a fixed 10 timesteps, which was causing lots of (expensive) remote calls on the eval workers.TODOs for future PRs:
Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.