Skip to content

port special tests from CircleCI to GHA #7396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 8, 2023
Merged

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Mar 7, 2023

Edit: #7399 removed the torch.hub tests from CI in general. Thus, the points below referring to that are moot.


Per title. This refers to

vision/.circleci/config.yml

Lines 1258 to 1262 in 5850f37

unittest:
jobs:
- unittest_torchhub
- unittest_onnx
- unittest_extended

or in more details

vision/.circleci/config.yml

Lines 327 to 360 in 5850f37

unittest_torchhub:
docker:
- image: cimg/python:3.8
steps:
- checkout
- install_torchvision
- run_tests_selective:
file_or_dir: test/test_hub.py
unittest_onnx:
docker:
- image: cimg/python:3.8
steps:
- checkout
- install_torchvision
- pip_install:
args: onnx onnxruntime
descr: Install ONNX
- run_tests_selective:
file_or_dir: test/test_onnx.py
unittest_extended:
docker:
- image: cimg/python:3.8
resource_class: xlarge
steps:
- checkout
- download_model_weights
- install_torchvision
- run:
name: Enable extended tests
command: echo 'export PYTORCH_TEST_WITH_EXTENDED=1' >> $BASH_ENV
- run_tests_selective:
file_or_dir: test/test_extended_*.py

These tests run only on CPU and a Linux box and are thus outside of the regular unittests.

Although we don't exclude them explicitly in pytest.ini these tests are not run with the regular unittests:

  • @pytest.mark.skipif("torchvision" in sys.modules, reason="TestHub must start without torchvision imported")

    torchvision will be imported by almost any other test module during collection
  • vision/test/test_onnx.py

    Lines 16 to 18 in 5850f37

    # In environments without onnxruntime we prefer to
    # invoke all tests in the repo and have this one skipped rather than fail.
    onnxruntime = pytest.importorskip("onnxruntime")
  • run_if_test_with_extended = pytest.mark.skipif(
    os.getenv("PYTORCH_TEST_WITH_EXTENDED", "0") != "1",
    reason="Extended tests are disabled by default. Set PYTORCH_TEST_WITH_EXTENDED=1 to run them.",
    )

Same deal as for the other migrations here: let's run the CircleCI and GHA tests in parallel for a few weeks and if nothing comes up, we can remove the ones on CircleCI.

cc @seemethere

Comment on lines +7 to +10
# Prepare conda
CONDA_PATH=$(which conda)
eval "$(${CONDA_PATH} shell.bash hook)"
conda activate ci
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osalpekar #7189 (comment) becomes even more relevant now. Without it, we need to repeat the top two lines everywhere. I'll get on it.

@@ -0,0 +1,41 @@
import asyncio
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is a implementation of

vision/.circleci/config.yml

Lines 171 to 189 in 5850f37

download_model_weights:
parameters:
extract_roots:
type: string
default: "torchvision/models"
background:
type: boolean
default: true
steps:
- apt_install:
args: parallel wget
descr: Install download utilitites
- run:
name: Download model weights
background: << parameters.background >>
command: |
mkdir -p ~/.cache/torch/hub/checkpoints
python scripts/collect_model_urls.py << parameters.extract_roots >> \
| parallel -j0 'wget --no-verbose -O ~/.cache/torch/hub/checkpoints/`basename {}` {}\?source=ci'

in Python. The old version relied on wget and parallel installed through apt, but they are not available through conda.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference is that this PR uses async downloads, while the old version used multiprocessing. It seems async is roughly 5x slower:

I'll try multiprocessing and see if this actually is the root cause or this just comes from the environment change between CircleCI and GHA.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried multiprocessing with threads in 5d6f391. The run aborted to a MemoryError. From the logs we can see though that it also took over 5 minutes: https://github.com/pytorch/vision/actions/runs/4364016074/jobs/7630816354#step:10:894

Thus, I would go with the async solution since that worked. I'm no expert in async / multiprocessing though. If someone sees possible perf improvements for either implementations, feel free to suggest.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried the solution with wget and parallel on GHA and it seems it is really the env that is causing the slowdown:

$ time python scripts/collect_model_urls.py torchvision/models/ | parallel -j0 'wget --no-verbose -O foo/`basename {}` {}\?source=ci'
[...]
real    5m0.152s
user    0m49.044s
sys     1m10.467s

Meaning, I'm totally fine using the async solution.

@pmeier pmeier marked this pull request as ready for review March 8, 2023 12:46
@pmeier pmeier requested a review from osalpekar March 8, 2023 12:46
Copy link
Member

@osalpekar osalpekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@pmeier pmeier merged commit e59cf64 into pytorch:main Mar 8, 2023
@pmeier pmeier deleted the special-tests branch March 8, 2023 21:29
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

Hey @pmeier!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2023
Reviewed By: vmoens

Differential Revision: D44416639

fbshipit-source-id: a1a088a1a8e04a38889652c1316ab00bb3f8f2ea
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants