[RLlib] Cleanup examples folder 04: Curriculum and checkpoint-by-custom-criteria examples moved to new API stack.#44706
Conversation
…nup_examples_folder_04
…nup_examples_folder_04
simonsays1980
left a comment
There was a problem hiding this comment.
LGTM. Very happy about the curriculum example.
|
|
||
| For debugging, use the following additional command line options | ||
| `--no-tune --num-env-runners=0` | ||
| which should allow you to set breakpoints anywhere in the RLlib code and |
There was a problem hiding this comment.
Works also with tune, but --local-mode :)
There was a problem hiding this comment.
Absolutely! I'm always afraid, we are going to get rid of Ray local-mode at some point. Also, for any number of Learner workers > 0, local mode doesn't work (not sure why, actually).
| ckpt = results.get_best_result(metric=policy_loss_key, mode="min").checkpoint | ||
| print("Lowest pol-loss: {}".format(ckpt)) | ||
| best_result = results.get_best_result(metric=policy_loss_key, mode="min") | ||
| ckpt = best_result.checkpoint |
There was a problem hiding this comment.
We could also ask here for the best checkpoint along the training path best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")
There was a problem hiding this comment.
Ah, cool, so ckpt = best_result.checkpoint returns the very last checkpoint only?
And if the last is not the best one, it's better to do:
best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")
??
There was a problem hiding this comment.
This actually doesn't seem to work well with nested keys.
If I do best_result.get_best_checkpoint(policy_loss_key, mode="min"), I get:
RuntimeError: Invalid metric name ('info', 'learner', 'default_policy', 'learner_stats', 'policy_loss')! You may choose from the following metrics: dict_keys(['custom_metrics', 'episode_media', 'info', 'sampler_results', 'episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'episodes_timesteps_total', 'policy_reward_min', 'policy_reward_max', 'policy_reward_mean', 'hist_stats', 'sampler_perf', 'num_faulty_episodes', 'connector_metrics', 'num_healthy_workers', 'num_in_flight_async_reqs', 'num_remote_worker_restarts', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'num_env_steps_sampled_throughput_per_sec', 'num_env_steps_trained_throughput_per_sec', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'timers', 'counters', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).
|
|
||
| ray.shutdown() | ||
| best_result = results.get_best_result(metric=vf_loss_key, mode="max") | ||
| ckpt = best_result.checkpoint |
| param_space=config.to_dict(), | ||
| run_config=air.RunConfig(stop=stop, verbose=2), | ||
| run_rllib_example_script_experiment( | ||
| base_config, args, stop=stop, success_metric={"task_solved": 1.0} |
…nup_examples_folder_04
Cleanup
examplesfolder 04:Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.