Skip to content

Add GPU testing config and GPU roberta model tests #2025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions .github/workflows/test-linux-gpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Unit-tests on Linux GPU

on:
pull_request:
push:
branches:
- nightly
- main
- release/*
workflow_dispatch:

env:
CHANNEL: "nightly"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a advantage of defining the env var here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following the pattern from other workflows. If you suggest fixing / cleanup can we do so to all of them?


jobs:
tests:
strategy:
matrix:
python_version: ["3.8"]
cuda_arch_version: ["11.6"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are dropping the support for 11.6, so might be better to move to 11.7.

https://pytorch.slack.com/archives/C2077MFDL/p1675256463971369

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this in #2040

fail-fast: false
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux.g5.4xlarge.nvidia.gpu
repository: pytorch/text
gpu-arch-type: cuda
gpu-arch-version: ${{ matrix.cuda_arch_version }}
timeout: 120
script: |
# Mark Build Directory Safe
git config --global --add safe.directory /__w/text/text

# Set up Environment Variables
export PYTHON_VERSION="${{ matrix.python_version }}"
export VERSION="${{ matrix.cuda_arch_version }}"
export CUDATOOLKIT="pytorch-cuda=${VERSION}"

# Set CHANNEL
if [[ (${GITHUB_EVENT_NAME} = 'pull_request' && (${GITHUB_BASE_REF} = 'release'*)) || (${GITHUB_REF} = 'refs/heads/release'*) ]]; then
export CHANNEL=test
else
export CHANNEL=nightly
fi

# Create Conda Env
conda create --quiet -yp ci_env python="${PYTHON_VERSION}"
conda activate /work/ci_env
python3 -m pip --quiet install cmake>=3.18.0 ninja
conda env update --file ".circleci/unittest/linux/scripts/environment.yml" --prune

# TorchText-specific Setup
printf "* Downloading SpaCy English models\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off the topic: but I feel like this can be run in background with & and wait for the process to finish before the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied most of this over from the other github workflow files. If improvements are needed, can we do that in separate PR?

Copy link
Contributor

@mthrok mthrok Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it’s just a random thought. Not necessary to follow up

python -m spacy download en_core_web_sm
printf "* Downloading SpaCy German models\n"
python -m spacy download de_core_news_sm

# Install PyTorch and TorchData
set -ex
conda install \
--yes \
--quiet \
-c "pytorch-${CHANNEL}" \
-c nvidia "pytorch-${CHANNEL}"::pytorch[build="*${VERSION}*"] \
"${CUDATOOLKIT}"
printf "Installing torchdata nightly\n"
python3 -m pip install --pre torchdata --extra-index-url https://download.pytorch.org/whl/nightly/cpu --quiet
python3 setup.py develop
python3 -m pip install parameterized --quiet

# Run Tests
python3 -m torch.utils.collect_env
cd test
python3 -m pytest --junitxml=test-results/junit.xml -v --durations 20 torchtext_unittest/models/gpu_tests
12 changes: 12 additions & 0 deletions test/torchtext_unittest/common/case_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import unittest
from itertools import zip_longest

import torch
from torchtext._internal.module_utils import is_module_available


Expand Down Expand Up @@ -37,6 +38,17 @@ def get_temp_path(self, *paths):
return path


class TestBaseMixin:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to introduce this for the sake of adding GPU tests? I'd split the PR.

NVM. After looking at the rest of the code, I see this is part of enabling GPU test.

"""Mixin to provide consistent way to define device/dtype/backend aware TestCase"""

dtype = None
device = None

def setUp(self):
super().setUp()
torch.random.manual_seed(2434)


def skipIfNoModule(module, display_name=None):
display_name = display_name or module
return unittest.skipIf(not is_module_available(module), f'"{display_name}" is not available')
Expand Down
11 changes: 11 additions & 0 deletions test/torchtext_unittest/models/gpu_tests/models_gpu_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import unittest

import torch
from torchtext_unittest.common.torchtext_test_case import TorchtextTestCase
from torchtext_unittest.models.models_test_impl import BaseTestModels


@unittest.skipIf(not torch.cuda.is_available(), reason="CUDA is not available")
class TestModels32GPU(BaseTestModels, TorchtextTestCase):
dtype = torch.float32
device = torch.device("cuda")
9 changes: 9 additions & 0 deletions test/torchtext_unittest/models/models_cpu_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import torch

from ..common.torchtext_test_case import TorchtextTestCase
from .models_test_impl import BaseTestModels


class TestModels32CPU(BaseTestModels, TorchtextTestCase):
dtype = torch.float32
device = torch.device("cpu")
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,42 @@
from unittest.mock import patch

import torch
import torchtext
from torch.nn import functional as torch_F

from ..common.torchtext_test_case import TorchtextTestCase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary and right to replace TorchtextTestCase?
Making TestBaseMixin be part of TorchtextTestCase is more aligned to the original design in torchaudio. (though I do not necessarily know all the details about torchtext test suite, I could be wrong here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because pytest will run all tests for all super classes of TorchtextTestCase So BaseTestModels must not extend it directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks for clarifying

from ..common.case_utils import TestBaseMixin


class TestModels(TorchtextTestCase):
class BaseTestModels(TestBaseMixin):
def get_model(self, encoder_conf, head=None, freeze_encoder=False, checkpoint=None, override_checkpoint_head=False):
from torchtext.models import RobertaBundle

model = RobertaBundle.build_model(
encoder_conf=encoder_conf,
head=head,
freeze_encoder=freeze_encoder,
checkpoint=checkpoint,
override_checkpoint_head=override_checkpoint_head,
)
model.to(device=self.device, dtype=self.dtype)
return model

def test_roberta_bundler_build_model(self) -> None:
from torchtext.models import RobertaClassificationHead, RobertaEncoderConf, RobertaModel, RobertaBundle
from torchtext.models import RobertaClassificationHead, RobertaEncoderConf, RobertaModel

dummy_encoder_conf = RobertaEncoderConf(
vocab_size=10, embedding_dim=16, ffn_dimension=64, num_attention_heads=2, num_encoder_layers=2
)

# case: user provide encoder checkpoint state dict
dummy_encoder = RobertaModel(dummy_encoder_conf)
model = RobertaBundle.build_model(encoder_conf=dummy_encoder_conf, checkpoint=dummy_encoder.state_dict())
model = self.get_model(encoder_conf=dummy_encoder_conf, checkpoint=dummy_encoder.state_dict())
self.assertEqual(model.state_dict(), dummy_encoder.state_dict())

# case: user provide classifier checkpoint state dict when head is given and override_head is False (by default)
dummy_classifier_head = RobertaClassificationHead(num_classes=2, input_dim=16)
another_dummy_classifier_head = RobertaClassificationHead(num_classes=2, input_dim=16)
dummy_classifier = RobertaModel(dummy_encoder_conf, dummy_classifier_head)
model = RobertaBundle.build_model(
model = self.get_model(
encoder_conf=dummy_encoder_conf,
head=another_dummy_classifier_head,
checkpoint=dummy_classifier.state_dict(),
Expand All @@ -34,7 +46,7 @@ def test_roberta_bundler_build_model(self) -> None:

# case: user provide classifier checkpoint state dict when head is given and override_head is set True
another_dummy_classifier_head = RobertaClassificationHead(num_classes=2, input_dim=16)
model = RobertaBundle.build_model(
model = self.get_model(
encoder_conf=dummy_encoder_conf,
head=another_dummy_classifier_head,
checkpoint=dummy_classifier.state_dict(),
Expand All @@ -48,13 +60,13 @@ def test_roberta_bundler_build_model(self) -> None:
encoder_state_dict = {}
for k, v in dummy_classifier.encoder.state_dict().items():
encoder_state_dict["encoder." + k] = v
model = torchtext.models.RobertaBundle.build_model(
model = self.get_model(
encoder_conf=dummy_encoder_conf, head=dummy_classifier_head, checkpoint=encoder_state_dict
)
self.assertEqual(model.state_dict(), dummy_classifier.state_dict())

def test_roberta_bundler_train(self) -> None:
from torchtext.models import RobertaClassificationHead, RobertaEncoderConf, RobertaModel, RobertaBundle
from torchtext.models import RobertaClassificationHead, RobertaEncoderConf, RobertaModel

dummy_encoder_conf = RobertaEncoderConf(
vocab_size=10, embedding_dim=16, ffn_dimension=64, num_attention_heads=2, num_encoder_layers=2
Expand All @@ -63,8 +75,8 @@ def test_roberta_bundler_train(self) -> None:

def _train(model):
optim = SGD(model.parameters(), lr=1)
model_input = torch.tensor([[0, 1, 2, 3, 4, 5]])
target = torch.tensor([0])
model_input = torch.tensor([[0, 1, 2, 3, 4, 5]]).to(device=self.device)
target = torch.tensor([0]).to(device=self.device)
logits = model(model_input)
loss = torch_F.cross_entropy(logits, target)
loss.backward()
Expand All @@ -73,7 +85,7 @@ def _train(model):
# does not freeze encoder
dummy_classifier_head = RobertaClassificationHead(num_classes=2, input_dim=16)
dummy_classifier = RobertaModel(dummy_encoder_conf, dummy_classifier_head)
model = RobertaBundle.build_model(
model = self.get_model(
encoder_conf=dummy_encoder_conf,
head=dummy_classifier_head,
freeze_encoder=False,
Expand All @@ -91,7 +103,7 @@ def _train(model):
# freeze encoder
dummy_classifier_head = RobertaClassificationHead(num_classes=2, input_dim=16)
dummy_classifier = RobertaModel(dummy_encoder_conf, dummy_classifier_head)
model = RobertaBundle.build_model(
model = self.get_model(
encoder_conf=dummy_encoder_conf,
head=dummy_classifier_head,
freeze_encoder=True,
Expand Down