Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 0 additions & 127 deletions doc/source/train/getting-started-pytorch-lightning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -394,130 +394,3 @@ Earlier versions aren't prohibited but may result in unexpected issues. If you r
.. note::

If you are using Lightning 2.x, please use the import path `lightning.pytorch.xxx` instead of `pytorch_lightning.xxx`.

.. _lightning-trainer-migration-guide:

LightningTrainer Migration Guide
--------------------------------

Ray 2.4 introduced the `LightningTrainer`, and exposed a
`LightningConfigBuilder` to define configurations for `pl.LightningModule`
and `pl.Trainer`.

It then instantiates the model and trainer objects and runs a pre-defined
training function in a black box.

This version of the LightningTrainer API was constraining and limited
your ability to manage the training functionality.

Ray 2.7 introduced the newly unified :class:`~ray.train.torch.TorchTrainer` API, which offers
enhanced transparency, flexibility, and simplicity. This API is more aligned
with standard PyTorch Lightning scripts, ensuring users have better
control over their native Lightning code.


.. tabs::

.. group-tab:: (Deprecating) LightningTrainer

.. This snippet isn't tested because it raises a hard deprecation warning.

.. testcode::
:skipif: True

from ray.train.lightning import LightningConfigBuilder, LightningTrainer

config_builder = LightningConfigBuilder()
# [1] Collect model configs
config_builder.module(cls=MNISTClassifier, lr=1e-3, feature_dim=128)

# [2] Collect checkpointing configs
config_builder.checkpointing(monitor="val_accuracy", mode="max", save_top_k=3)

# [3] Collect pl.Trainer configs
config_builder.trainer(
max_epochs=10,
accelerator="gpu",
log_every_n_steps=100,
logger=CSVLogger("./logs"),
)

# [4] Build datasets on the head node
datamodule = MNISTDataModule(batch_size=32)
config_builder.fit_params(datamodule=datamodule)

# [5] Execute the internal training function in a black box
ray_trainer = LightningTrainer(
lightning_config=config_builder.build(),
scaling_config=ScalingConfig(num_workers=4, use_gpu=True),
run_config=RunConfig(
checkpoint_config=CheckpointConfig(
num_to_keep=3,
checkpoint_score_attribute="val_accuracy",
checkpoint_score_order="max",
),
)
)
ray_trainer.fit()



.. group-tab:: (New API) TorchTrainer

.. This snippet isn't tested because it runs with 4 GPUs, and CI is only run with 1.

.. testcode::
:skipif: True

import lightning.pytorch as pl
from ray.air import CheckpointConfig, RunConfig
from ray.train.torch import TorchTrainer
from ray.train.lightning import (
RayDDPStrategy,
RayLightningEnvironment,
RayTrainReportCallback,
prepare_trainer
)

def train_func(config):
# [1] Create a Lightning model
model = MNISTClassifier(lr=1e-3, feature_dim=128)

# [2] Report Checkpoint with callback
ckpt_report_callback = RayTrainReportCallback()

# [3] Create a Lighting Trainer
datamodule = MNISTDataModule(batch_size=32)

trainer = pl.Trainer(
max_epochs=10,
log_every_n_steps=100,
logger=CSVLogger("./logs"),
# New configurations below
devices="auto",
accelerator="auto",
strategy=RayDDPStrategy(),
plugins=[RayLightningEnvironment()],
callbacks=[ckpt_report_callback],
)

# Validate your Lightning trainer configuration
trainer = prepare_trainer(trainer)

# [4] Build your datasets on each worker
datamodule = MNISTDataModule(batch_size=32)
trainer.fit(model, datamodule=datamodule)

# [5] Explicitly define and run the training function
ray_trainer = TorchTrainer(
train_func,
scaling_config=ScalingConfig(num_workers=4, use_gpu=True),
run_config=RunConfig(
checkpoint_config=CheckpointConfig(
num_to_keep=3,
checkpoint_score_attribute="val_accuracy",
checkpoint_score_order="max",
),
)
)
ray_trainer.fit()
151 changes: 0 additions & 151 deletions doc/source/train/getting-started-transformers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,154 +307,3 @@ After you have converted your Hugging Face Transformers training script to use R
* See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
* Dive into the :ref:`API Reference <train-api>` for more details on the classes and methods used in this tutorial.


.. _transformers-trainer-migration-guide:

TransformersTrainer Migration Guide
-----------------------------------

Ray 2.1 introduced the `TransformersTrainer`, which exposes a `trainer_init_per_worker` interface
to define `transformers.Trainer`, then runs a pre-defined training function in a black box.

Ray 2.7 introduced the newly unified :class:`~ray.train.torch.TorchTrainer` API,
which offers enhanced transparency, flexibility, and simplicity. This API aligns more
with standard Hugging Face Transformers scripts, ensuring that you have better control over your
native Transformers training code.


.. tabs::

.. group-tab:: (Deprecating) TransformersTrainer

.. This snippet isn't tested because it contains skeleton code.

.. testcode::
:skipif: True

import transformers
from transformers import AutoConfig, AutoModelForCausalLM
from datasets import load_dataset

import ray
from ray.train.huggingface import TransformersTrainer
from ray.train import ScalingConfig

# Dataset
def preprocess(examples):
...

hf_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
processed_ds = hf_datasets.map(preprocess, ...)

ray_train_ds = ray.data.from_huggingface(processed_ds["train"])
ray_eval_ds = ray.data.from_huggingface(processed_ds["validation"])

# Define the Trainer generation function
def trainer_init_per_worker(train_dataset, eval_dataset, **config):
MODEL_NAME = "gpt2"
model_config = AutoConfig.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_config(model_config)
args = transformers.TrainingArguments(
output_dir=f"{MODEL_NAME}-wikitext2",
evaluation_strategy="epoch",
save_strategy="epoch",
logging_strategy="epoch",
learning_rate=2e-5,
weight_decay=0.01,
max_steps=100,
)
return transformers.Trainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
)

# Build a Ray TransformersTrainer
scaling_config = ScalingConfig(num_workers=4, use_gpu=True)
ray_trainer = TransformersTrainer(
trainer_init_per_worker=trainer_init_per_worker,
scaling_config=scaling_config,
datasets={"train": ray_train_ds, "evaluation": ray_eval_ds},
)
result = ray_trainer.fit()


.. group-tab:: (New API) TorchTrainer

.. This snippet isn't tested because it contains skeleton code.

.. testcode::
:skipif: True

import transformers
from transformers import AutoConfig, AutoModelForCausalLM
from datasets import load_dataset

import ray
from ray.train.huggingface.transformers import (
RayTrainReportCallback,
prepare_trainer,
)
from ray.train import ScalingConfig

# Dataset
def preprocess(examples):
...

hf_datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
processed_ds = hf_datasets.map(preprocess, ...)

ray_train_ds = ray.data.from_huggingface(processed_ds["train"])
ray_eval_ds = ray.data.from_huggingface(processed_ds["evaluation"])

# [1] Define the full training function
# =====================================
def train_func(config):
MODEL_NAME = "gpt2"
model_config = AutoConfig.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_config(model_config)

# [2] Build Ray Data iterables
# ============================
train_dataset = ray.train.get_dataset_shard("train")
eval_dataset = ray.train.get_dataset_shard("evaluation")

train_iterable_ds = train_dataset.iter_torch_batches(batch_size=8)
eval_iterable_ds = eval_dataset.iter_torch_batches(batch_size=8)

args = transformers.TrainingArguments(
output_dir=f"{MODEL_NAME}-wikitext2",
evaluation_strategy="epoch",
save_strategy="epoch",
logging_strategy="epoch",
learning_rate=2e-5,
weight_decay=0.01,
max_steps=100,
)

trainer = transformers.Trainer(
model=model,
args=args,
train_dataset=train_iterable_ds,
eval_dataset=eval_iterable_ds,
)

# [3] Inject Ray Train Report Callback
# ====================================
trainer.add_callback(RayTrainReportCallback())

# [4] Prepare your trainer
# ========================
trainer = prepare_trainer(trainer)
trainer.train()

# Build a Ray TorchTrainer
scaling_config = ScalingConfig(num_workers=4, use_gpu=True)
ray_trainer = TorchTrainer(
train_func,
scaling_config=scaling_config,
datasets={"train": ray_train_ds, "evaluation": ray_eval_ds},
)
result = ray_trainer.fit()
15 changes: 0 additions & 15 deletions doc/source/train/huggingface-accelerate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,18 +202,3 @@ You may also find these user guides helpful:
- :ref:`Configuration and Persistent Storage <train-run-config>`
- :ref:`Saving and Loading Checkpoints <train-checkpointing>`
- :ref:`How to use Ray Data with Ray Train <data-ingest-torch>`


AccelerateTrainer Migration Guide
---------------------------------

Before Ray 2.7, Ray Train's `AccelerateTrainer` API was the
recommended way to run Accelerate code. As a subclass of :class:`TorchTrainer <ray.train.torch.TorchTrainer>`,
the AccelerateTrainer takes in a configuration file generated by ``accelerate config`` and applies it to all workers.
Aside from that, the functionality of ``AccelerateTrainer`` is identical to ``TorchTrainer``.

However, this caused confusion around whether this was the *only* way to run Accelerate code.
Because you can express the full Accelerate functionality with the ``Accelerator`` and ``TorchTrainer`` combination, the plan is to deprecate the ``AccelerateTrainer`` in Ray 2.8,
and it's recommend to run your Accelerate code directly with ``TorchTrainer``.


2 changes: 1 addition & 1 deletion python/ray/data/datasource/huggingface_datasource.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
# in the module you are currently viewing. This ensures that when we
# unpickle the Dataset, it runs before pickle tries to
# import datasets_modules and prevents an exception from being thrown.
# Same logic is present inside Ray's TransformersTrainer and HF Transformers Ray
# Same logic is present inside HF Transformers Ray
# integration: https://github.com/huggingface/transformers/blob/\
# 7d5fde991d598370d961be8cb7add6541e2b59ce/src/transformers/integrations.py#L271
# Also see https://github.com/ray-project/ray/issues/28084
Expand Down
19 changes: 0 additions & 19 deletions python/ray/train/huggingface/__init__.py
Original file line number Diff line number Diff line change
@@ -1,19 +0,0 @@
from ray.train.huggingface.accelerate import AccelerateTrainer
from ray.train.huggingface.huggingface_checkpoint import HuggingFaceCheckpoint
from ray.train.huggingface.huggingface_predictor import HuggingFacePredictor
from ray.train.huggingface.huggingface_trainer import HuggingFaceTrainer
from ray.train.huggingface.transformers import (
TransformersCheckpoint,
TransformersPredictor,
TransformersTrainer,
)

__all__ = [
"AccelerateTrainer",
"HuggingFaceCheckpoint",
"HuggingFacePredictor",
"HuggingFaceTrainer",
"TransformersCheckpoint",
"TransformersPredictor",
"TransformersTrainer",
]
7 changes: 0 additions & 7 deletions python/ray/train/huggingface/_deprecation_msg.py

This file was deleted.

5 changes: 0 additions & 5 deletions python/ray/train/huggingface/accelerate/__init__.py

This file was deleted.

Loading