[train] Hard-deprecate `MosaicTrainer` and remove `SklearnTrainer` by justinvyu · Pull Request #42814 · ray-project/ray

justinvyu · 2024-01-30T01:25:07Z

Why are these changes needed?

MosaicTrainer is not needed, since it can be folded under TorchTrainer, similar to LightningTrainer and TransformersTrainer

SklearnTrainer is also not needed, since it does not provide any extra utility over just calling Tuner(sklearn_train_fn). The Ray Train version added an extra layer of abstraction that is not needed over calling the underlying sklearn methods. There was no multi-worker training happening.

TODO

Finish writing the migration issue.

Notes

Mosaic Composer does not actually work with Ray Train as of 2.9.1 due to a signal handler it tries to register that must run on the main thread. Ray Train currently creates a worker thread to execute the training logic. This is because Ray Train implements pause-able training execution for each training worker by communicating between a main thread and a worker thread through ray.train.report.

Related issue number

Closes #32732

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…ecate_sklearn_trainer

matthewdeng · 2024-01-30T01:35:41Z

python/ray/train/mosaic/mosaic_trainer.py

+    "`ray.train.mosaic.MosaicTrainer` is deprecated. "
+    "Use `ray.train.torch.TorchTrainer` instead. "
+    "See this issue for a migration example: "
+    "https://github.com/ray-project/ray/issues/42257"


We won't be able to have a migration example because of the threading issue though, right?

Yeah, I was planning to just keep it as "not possible for now", then update the issue later. Or should I just remove the migration link for Mosaic trainer?

Oh yeah we can have a GH Issue that says it's not supported right now (the current one points to sklearn)

woshiyyya · 2024-01-30T18:07:22Z

python/ray/train/sklearn/sklearn_predictor.py

+
+
 @PublicAPI(stability="alpha")
 class SklearnPredictor(Predictor):


Do we still need SklearnPredictor?

We still have all the predictors around but not shown in docs. Maybe we can remove them all at some point at once?

woshiyyya · 2024-01-30T18:26:50Z

Does the main thread refer to the driver and all Ray actors methods are child threads? Seems that the signal handling issue is unresolvable in current Ray Train design.

justinvyu · 2024-01-30T18:31:02Z

@woshiyyya It refers to the main thread of a single worker actor that is processing the result queue from the training thread:

Seems that the signal handling issue is unresolvable in current Ray Train design.

Yep this is unresolvable unless we get rid of this threading logic (which is an implementation detail for implementing "yielding" behavior, that can be done with Ray Generators in the future).

…ecate_sklearn_trainer

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

woshiyyya · 2024-01-30T19:21:33Z

Got it. Signal handling is a missing feature that we need to support it in the future. User can save checkpoint on exception before the actors get killed.

Lightning's on_exception callback
An oss issue

matthewdeng · 2024-01-30T22:26:24Z

python/ray/train/mosaic/mosaic_trainer.py

+    "`ray.train.mosaic.MosaicTrainer` is deprecated. "
+    "Use `ray.train.torch.TorchTrainer` instead. "
+    "See this issue for a migration example: "
+    "https://github.com/ray-project/ray/issues/42257"


Oh yeah we can have a GH Issue that says it's not supported right now (the current one points to sklearn)

…ecate_sklearn_trainer

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

…ay-project#42814) This PR removes some already deprecated APIs to reduce the library surface area and remove unused/unnecessary components. (`MosaicTrainer` can be folded into `TorchTrainer`, and `SklearnTrainer` doesn't provide any value over using Tune with your own training loop.) --------- Signed-off-by: Justin Yu <justinvyu@anyscale.com>

justinvyu added 10 commits January 8, 2024 16:57

remove sklearn trainer

124d099

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

remove unused utils

9920d4d

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

add message

5c3a86c

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Remove sklearn trainer test

e35a6aa

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

fix lint

38aec83

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

remove mosaic trainer

16d6c99

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

add todo

3b850ea

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

add gh issue

fd823a3

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

remove mosaic trainer test

77ba966

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

Merge branch 'master' of https://github.com/ray-project/ray into depr…

e96f703

…ecate_sklearn_trainer

justinvyu requested review from matthewdeng and woshiyyya January 30, 2024 01:25

justinvyu assigned matthewdeng and woshiyyya Jan 30, 2024

matthewdeng reviewed Jan 30, 2024

View reviewed changes

woshiyyya reviewed Jan 30, 2024

View reviewed changes

justinvyu added 2 commits January 30, 2024 10:42

Merge branch 'master' of https://github.com/ray-project/ray into depr…

4e9cdf0

…ecate_sklearn_trainer

remove test

b72ffc9

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

woshiyyya mentioned this pull request Jan 30, 2024

[ray.train] Context managers will not trigger __exit__ in the distributed part of the code on exceptions. #39055

Closed

matthewdeng approved these changes Jan 30, 2024

View reviewed changes

justinvyu added 2 commits January 31, 2024 14:52

Merge branch 'master' of https://github.com/ray-project/ray into depr…

3ac6b40

…ecate_sklearn_trainer

update issue

1a748c8

Signed-off-by: Justin Yu <justinvyu@anyscale.com>

justinvyu merged commit c13f233 into ray-project:master Jan 31, 2024

justinvyu deleted the deprecate_sklearn_trainer branch January 31, 2024 23:51

Superskyyy mentioned this pull request Oct 4, 2024

[Train] Remove deprecated mosaic and sklearn trainer code #47901

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] Hard-deprecate `MosaicTrainer` and remove `SklearnTrainer`#42814

[train] Hard-deprecate `MosaicTrainer` and remove `SklearnTrainer`#42814
justinvyu merged 14 commits intoray-project:masterfrom
justinvyu:deprecate_sklearn_trainer

justinvyu commented Jan 30, 2024

Uh oh!

matthewdeng Jan 30, 2024

Uh oh!

justinvyu Jan 30, 2024

Uh oh!

matthewdeng Jan 30, 2024

Uh oh!

woshiyyya Jan 30, 2024

Uh oh!

justinvyu Jan 30, 2024

Uh oh!

woshiyyya commented Jan 30, 2024

Uh oh!

justinvyu commented Jan 30, 2024 •

edited

Loading

Uh oh!

woshiyyya commented Jan 30, 2024 •

edited

Loading

Uh oh!

matthewdeng Jan 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@PublicAPI(stability="alpha")
		class SklearnPredictor(Predictor):

Conversation

justinvyu commented Jan 30, 2024

Why are these changes needed?

TODO

Notes

Related issue number

Checks

Uh oh!

matthewdeng Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

justinvyu Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

matthewdeng Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

woshiyyya Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

justinvyu Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

woshiyyya commented Jan 30, 2024

Uh oh!

justinvyu commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woshiyyya commented Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewdeng Jan 30, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

justinvyu commented Jan 30, 2024 •

edited

Loading

woshiyyya commented Jan 30, 2024 •

edited

Loading