Description
🚀 Feature
This RFC proposes to add a number of tests in PyTorch/XLA CI that exercises the
combination of torch_xla
and Hugging Face libraries.
Motivation
Testing against our customer's code ensures that we do not break common user
workflows.
Pitch
Historically, PyTorch/XLA CI had some HuggingFace tests that install the latest
version of transformer
, diffuser
, and accelerate
from the main branch of
the respective git repositories. That causes test breakage when HuggingFace
introduced backwards incompatible changes. To prevent those issues, we'll pin
HuggingFace libraries to a fixed version when running the tests.
In principle, we should pin other packages that may affect the training,
such as numpy
. However, torch_xla
and torch
itself also depends on a
number of Python libraries, such as numpy
and networkx
. Therefore we'll keep
the list of pinned packages small to start with, and we can always grow later if
a particular package becomes problematic.
List of tests
We propose these tests, which is a slight variation of the existing tests removed
in 3.
Name | Type | Test in nightly? | Test in RC? | Notes |
---|---|---|---|---|
Llama 2 7B training | Example | Yes (already exists) | Yes (already exists) | Testing the llama2-google-next-training branch in pytorch-tpu fork of HF transformers |
SD2 training | Example | New addition | New addition | Testing the main branch in pytorch-tpu fork of HF diffusers |
accelerate test | Smoke test | Add back | Add back | See note #1. |
bert | Example | Add back | Add back | This exercises our own test (pytorch/xla/test/pjrt/test_train_hf_transformer.py) so we should run it |
diffusers | Example | Remove | Remove | This trains stable-diffusion-v1. Replaced by planned SD2 training test |
The SD2 training test will be added referencing the recipe in tpu-recipes
2.
Note #1: the accelerate test broke for a few weeks and we suspected it was due to
upstream changes in Hugging Face. After I filed 4, it turns out that this was
really a case of PyTorch/XLA changes 5 breaking Hugging Face. When we add back
this test we should workaround the breakage.
Note #2: during local testing, the bert
test has a race condition at the end
causing a OSError: handle is closed
. That also looks like a legit error
stemming from incorrect multiprocessing usage.
Initial pinned versions
Based on local testing, I've narrowed to the following versions that works for
the above tests:
accelerate==1.2.1
datasets==3.2.0
evaluate==0.4.3
huggingface-hub==0.27.1
safetensors==0.5.0
tokenizers==0.19.1
We'll check in this file as a pip-constraints.txt
(constraint file 1) in
https://github.com/GoogleCloudPlatform/ml-auto-solutions, so that whenever a
HuggingFace library is installed, it is constrained to be one of the tested
version. This file will be shared by all tests in the list above.
transformers
will be installed from
https://github.com/pytorch-tpu/transformers/tree/llama2-google-next-training
and diffusers
will be installed from
https://github.com/pytorch-tpu/diffusers/tree/main. If we don't touch these
branches, then they will also be effectively pinned.
What to do if a test fail?
We should prioritize on reverting an offending PR if a change in torch_xla
broke HuggingFace tests.
Alternatives
It's also worth testing tip-of-tree versions of HuggingFace libraries against
stable versions of torch_xla
. This ensures that HuggingFace does not introduce
new breakages in their development cycle. We should work with the HuggingFace
team to help them setup the tests on their end. That can be done independently
from this proposal.
Additional context
We had some HuggingFace tests for a while but they frequently broke due to the
lack of version pinning, and they were removed in 3.