[train] Simplify ray.train.xgboost/lightgbm (1/n): Align frequency-based and checkpoint_at_end checkpoint formats#42111
Conversation
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…xgboost_ckpting Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
…ve an alias in tune Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
XGBoostTrainer and LightGBMTrainer checkpointingray.train.xgboost/lightgbm (1/n): Align frequency-based and checkpoint_at_end checkpoint formats
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
… hook gets called) Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
| booster: lightgbm.Booster, | ||
| *, | ||
| preprocessor: Optional["Preprocessor"] = None, | ||
| path: Optional[str] = None, |
There was a problem hiding this comment.
Do we still need these changes if we're centralizing on the Callbacks?
There was a problem hiding this comment.
Nope I can get rid of it. If anybody does use this, specifying your own temp dir might be useful though if you want it to be cleaned up after.
| from ray.train.lightgbm import RayTrainReportCallback | ||
|
|
||
| # Get a `Checkpoint` object that is saved by the callback during training. | ||
| result = trainer.fit() |
There was a problem hiding this comment.
nit: For consistency with this, should we update the training example to use the LightGBMTrainer? Same for xgboost.
There was a problem hiding this comment.
I want to add the *Trainer examples once I add in a v2 xgboost/lightgbm trainer, since then it'll actually show the callback usage in the training func. Right now the user doesn't need to create the callback themselves.
| independent xgboost trials (without data parallelism within a trial). | ||
|
|
||
| .. testcode:: | ||
| :skipif: True |
There was a problem hiding this comment.
Are we going to add them back later?
There was a problem hiding this comment.
This used to be a code-block that didn't run 😅 I just wanted to show a mock xgboost.train call with the callback inside, without needing to specify the dataset and everything.
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
|
|
||
|
|
||
| @PublicAPI(stability="beta") | ||
| class RayTrainReportCallback: |
There was a problem hiding this comment.
TuneCallback for lgbm was originally an empty class that wasn't referenced anywhere else so I just removed it.
Why are these changes needed?
This PR fixes
XGBoostTrainerandLightGBMTrainercheckpointing:ray.train.xgboost/lightgbm.RayTrainReportCallbackas the standard utilities to define checkpoints save/load format.XGBoostCheckpoint, (2)ray.tune.integration.xgboost.TuneReportCheckpointCallback, and (3)XGBoostTrainer._save_model.XGBoostCheckpoint.MODEL_FILENAMEconstant in some places. But, we re-implemented thefrom_modelandget_modellogic for some reason.CheckpointConfig(checkpoint_frequency)ray.train.*.RayTrainReportCallback) that handles bothcheckpoint_frequencyandcheckpoint_at_end. This codepath standardizes on the framework specific checkpoint implementation of checkpoint saving.TuneReportCallback). The migration is simple:TuneReportCallback() -> TuneReportCheckpointCallback(frequency=0).ray.tuneandxgboost_ray/lightgbm_ray.xgboost_ray -> ray.tune.* -> ray.train.* -> ray.train.xgboost -> xgboost_rayImportErrorwhichxgboost_rayincorrectly used to determine whether Ray Train/Tune were installed.xgboost_rayandlightgbm_raydependencies by re-implementing simple versions of these trainers asDataParallelTrainers. See: [train] Simplifyray.train.xgboost/lightgbm(2/n): Re-implementXGBoostTraineras a lightweightDataParallelTrainer#42767.API Change Summary
ray.train.xgboost.RayTrainReportCallbackray.train.lightning.RayTrainReportCallback. This will be exposed to users if they have full control over the training loop in the new simplifiedXGBoostTrainer.ray.train.xgboost.RayTrainReportCallback.get_model(filename)XGBoostTrainer.get_modelin the future.ray.tune.integration.xgboost.TuneReportCheckpointCallbackThe same APIs are introduced for the lightgbm counterparts.
TODOs left for followups
xgboost_rayright now.checkpoint_at_endvs.checkpoint_frequencyoverlap logic for the test case with a TODO intest_xgboost_trainerafter switching to the simplified xgboost trainer.Related issue number
Closes #41608
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.