Skip to content

[RLlib] Cleanup examples folder 04: Curriculum and checkpoint-by-custom-criteria examples moved to new API stack.#44706

Merged
sven1977 merged 8 commits intoray-project:masterfrom
sven1977:cleanup_examples_folder_04
Apr 14, 2024
Merged

[RLlib] Cleanup examples folder 04: Curriculum and checkpoint-by-custom-criteria examples moved to new API stack.#44706
sven1977 merged 8 commits intoray-project:masterfrom
sven1977:cleanup_examples_folder_04

Conversation

@sven1977
Copy link
Contributor

@sven1977 sven1977 commented Apr 12, 2024

Cleanup examples folder 04:

  • Curriculum example moved to new API stack.
  • checkpoint-by-custom-criteria example moved to new API stack.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 added rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples rllib-newstack labels Apr 12, 2024
Copy link
Contributor

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very happy about the curriculum example.


For debugging, use the following additional command line options
`--no-tune --num-env-runners=0`
which should allow you to set breakpoints anywhere in the RLlib code and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works also with tune, but --local-mode :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! I'm always afraid, we are going to get rid of Ray local-mode at some point. Also, for any number of Learner workers > 0, local mode doesn't work (not sure why, actually).

ckpt = results.get_best_result(metric=policy_loss_key, mode="min").checkpoint
print("Lowest pol-loss: {}".format(ckpt))
best_result = results.get_best_result(metric=policy_loss_key, mode="min")
ckpt = best_result.checkpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also ask here for the best checkpoint along the training path best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool, so ckpt = best_result.checkpoint returns the very last checkpoint only?

And if the last is not the best one, it's better to do:
best_result.get_best_checkpoint(metric=policy_loss_key, mode="min")
??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually doesn't seem to work well with nested keys.
If I do best_result.get_best_checkpoint(policy_loss_key, mode="min"), I get:

RuntimeError: Invalid metric name ('info', 'learner', 'default_policy', 'learner_stats', 'policy_loss')! You may choose from the following metrics: dict_keys(['custom_metrics', 'episode_media', 'info', 'sampler_results', 'episode_reward_max', 'episode_reward_min', 'episode_reward_mean', 'episode_len_mean', 'episodes_this_iter', 'episodes_timesteps_total', 'policy_reward_min', 'policy_reward_max', 'policy_reward_mean', 'hist_stats', 'sampler_perf', 'num_faulty_episodes', 'connector_metrics', 'num_healthy_workers', 'num_in_flight_async_reqs', 'num_remote_worker_restarts', 'num_agent_steps_sampled', 'num_agent_steps_trained', 'num_env_steps_sampled', 'num_env_steps_trained', 'num_env_steps_sampled_this_iter', 'num_env_steps_trained_this_iter', 'num_env_steps_sampled_throughput_per_sec', 'num_env_steps_trained_throughput_per_sec', 'timesteps_total', 'num_steps_trained_this_iter', 'agent_timesteps_total', 'timers', 'counters', 'done', 'episodes_total', 'training_iteration', 'trial_id', 'date', 'timestamp', 'time_this_iter_s', 'time_total_s', 'pid', 'hostname', 'node_ip', 'config', 'time_since_restore', 'iterations_since_restore', 'perf', 'experiment_tag']).


ray.shutdown()
best_result = results.get_best_result(metric=vf_loss_key, mode="max")
ckpt = best_result.checkpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

param_space=config.to_dict(),
run_config=air.RunConfig(stop=stop, verbose=2),
run_rllib_example_script_experiment(
base_config, args, stop=stop, success_metric={"task_solved": 1.0}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice example 👍

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit f1f0ced into ray-project:master Apr 14, 2024
@sven1977 sven1977 deleted the cleanup_examples_folder_04 branch April 14, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rllib RLlib related issues rllib-docs-or-examples Issues related to RLlib documentation or rllib/examples

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants