Skip to content

[train][tune] Fix LightGBM v2 callbacks for Tune only usage#57042

Merged
justinvyu merged 7 commits intoray-project:masterfrom
liulehui:fix-tune-callbacks
Oct 6, 2025
Merged

[train][tune] Fix LightGBM v2 callbacks for Tune only usage#57042
justinvyu merged 7 commits intoray-project:masterfrom
liulehui:fix-tune-callbacks

Conversation

@liulehui
Copy link
Contributor

@liulehui liulehui commented Sep 30, 2025

Why are these changes needed?

  1. in the ray train revamp REP, we decouple the ray train/ray tune dependency.
  2. Hence, when using RayTrainReportCallback when reporting metrics or checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils is not found.
  3. in this PR, refactor the Callback by inheriting the same base class but using ray.tune.report for tune only and ray.train.report for RayTrainReportCallback based on migration example here to further differentiate these callbacks.
  4. ran with testing script: https://gist.github.com/liulehui/58001000fa195c8f000a2992ba3c77e1 and output in the gist comment.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run pre-commit jobs to lint the changes in this PR. (pre-commit setup)
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Note

Refactors LightGBM callback into a shared base and implements separate Train and Tune callbacks with correct checkpointing/reporting for each API.

  • LightGBM callbacks:
    • Refactor: Extract common logic into RayReportCallback with abstract methods for checkpointing and reporting.
    • Train: RayTrainReportCallback now subclasses RayReportCallback, using ray.train.report and rank-aware checkpointing.
    • Tune: Add TuneReportCheckpointCallback subclass using ray.tune.report and tune.Checkpoint (no rank check), replacing prior alias.
    • Minor doc updates: examples and return docs adjusted; references clarified to LightGBM.

Written by Cursor Bugbot for commit 8e6abfb. This will update automatically on new commits. Configure here.

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Lehui Liu <lehui@anyscale.com>
@liulehui liulehui requested review from a team as code owners September 30, 2025 17:41
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the LightGBM callbacks to decouple Ray Tune and Ray Train usage, which is a great improvement. A base RayReportCallback is introduced with specific implementations for Tune and Train. My review focuses on improving the new abstractions and reducing code redundancy. I've pointed out an incorrect method signature in the new abstract base class and suggested removing redundant __init__ methods in the subclasses to make the code cleaner and more maintainable.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Lehui Liu <lehui@anyscale.com>
cursor[bot]

This comment was marked as outdated.

@ray-gardener ray-gardener bot added tune Tune-related issues docs An issue or change related to documentation train Ray Train Related Issue labels Sep 30, 2025
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
@justinvyu justinvyu enabled auto-merge (squash) October 3, 2025 00:17
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 3, 2025
cursor[bot]

This comment was marked as outdated.

@github-actions github-actions bot disabled auto-merge October 3, 2025 17:04
Signed-off-by: Lehui Liu <lehui@anyscale.com>
@justinvyu justinvyu merged commit bc9723a into ray-project:master Oct 6, 2025
6 checks passed
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
eicherseiji pushed a commit to eicherseiji/ray that referenced this pull request Oct 6, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
liulehui added a commit to liulehui/ray that referenced this pull request Oct 9, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
joshkodi pushed a commit to joshkodi/ray that referenced this pull request Oct 13, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Josh Kodi <joshkodi@gmail.com>
justinvyu added a commit that referenced this pull request Oct 16, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: #57534, #57256, #56868, #56820, #56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by #57042 and
#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
justinyeh1995 pushed a commit to justinyeh1995/ray that referenced this pull request Oct 20, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
xinyuangui2 pushed a commit to xinyuangui2/ray that referenced this pull request Oct 22, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: xgui <xgui@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Oct 23, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: #57534, #57256, #56868, #56820, #56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by #57042 and
#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…ect#57042)

1. in the ray train [revamp
REP](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage),
we decouple the ray train/ray tune dependency.
2. Hence, when using RayTrainReportCallback when reporting metrics or
checkpoint: the v2 context api will throw RuntimeError that TrainFnUtils
is not found.
3. in this PR, refactor the Callback by inheriting the same base class
but using `ray.tune.report` for tune only and `ray.train.report` for
`RayTrainReportCallback` based on migration example
[here](https://github.com/ray-project/enhancements/blob/main/reps/2024-10-18-train-tune-api-revamp/2024-10-18-train-tune-api-revamp.md#tune-only-usage)
to further differentiate these callbacks.

---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
Ports over the remaining unit tests that were marked as TODOs from this
series of PRs: ray-project#57534, ray-project#57256, ray-project#56868, ray-project#56820, ray-project#56816.

Notably:
* `test_new_dataset_config -> test_data_integration`
* `test_backend -> test_torch_trainer, test_worker_group`
* `test_gpu -> test_torch_gpu`

This PR also finishes migrating the Tune LightGBM/Keras examples which
were unblocked by ray-project#57042 and
ray-project#57121.

---------

Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs An issue or change related to documentation go add ONLY when ready to merge, run all tests train Ray Train Related Issue tune Tune-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants