port special tests from CircleCI to GHA #7396

pmeier · 2023-03-07T16:00:34Z

Edit: #7399 removed the torch.hub tests from CI in general. Thus, the points below referring to that are moot.

Per title. This refers to

Lines 1258 to 1262 in 5850f37

    
           unittest: 
        
             jobs: 
        
               - unittest_torchhub 
        
               - unittest_onnx 
        
               - unittest_extended

or in more details

vision/.circleci/config.yml

Lines 327 to 360 in 5850f37

    
           unittest_torchhub: 
        
             docker: 
        
               - image: cimg/python:3.8 
        
             steps: 
        
               - checkout 
        
               - install_torchvision 
        
               - run_tests_selective: 
        
                   file_or_dir: test/test_hub.py 
        
           unittest_onnx: 
        
             docker: 
        
               - image: cimg/python:3.8 
        
             steps: 
        
               - checkout 
        
               - install_torchvision 
        
               - pip_install: 
        
                   args: onnx onnxruntime 
        
                   descr: Install ONNX 
        
               - run_tests_selective: 
        
                   file_or_dir: test/test_onnx.py 
        
           unittest_extended: 
        
             docker: 
        
               - image: cimg/python:3.8 
        
             resource_class: xlarge 
        
             steps: 
        
               - checkout 
        
               - download_model_weights 
        
               - install_torchvision 
        
               - run: 
        
                   name: Enable extended tests 
        
                   command: echo 'export PYTORCH_TEST_WITH_EXTENDED=1' >> $BASH_ENV 
        
               - run_tests_selective: 
        
                   file_or_dir: test/test_extended_*.py

These tests run only on CPU and a Linux box and are thus outside of the regular unittests.

Although we don't exclude them explicitly in pytest.ini these tests are not run with the regular unittests:

vision/test/test_hub.py

Line 20 in 5850f37

@pytest.mark.skipif("torchvision" in sys.modules, reason="TestHub must start without torchvision imported")

torchvision will be imported by almost any other test module during collection

vision/test/test_onnx.py

Lines 16 to 18 in 5850f37

    
           # In environments without onnxruntime we prefer to 
        
           # invoke all tests in the repo and have this one skipped rather than fail. 
        
           onnxruntime = pytest.importorskip("onnxruntime")

vision/test/test_extended_models.py

Lines 14 to 17 in 5850f37

    
           run_if_test_with_extended = pytest.mark.skipif( 
        
               os.getenv("PYTORCH_TEST_WITH_EXTENDED", "0") != "1", 
        
               reason="Extended tests are disabled by default. Set PYTORCH_TEST_WITH_EXTENDED=1 to run them.", 
        
           )

Same deal as for the other migrations here: let's run the CircleCI and GHA tests in parallel for a few weeks and if nothing comes up, we can remove the ones on CircleCI.

cc @seemethere

pmeier · 2023-03-08T10:19:11Z

.github/scripts/unittest.sh

+# Prepare conda
+CONDA_PATH=$(which conda)
+eval "$(${CONDA_PATH} shell.bash hook)"
+conda activate ci


@osalpekar #7189 (comment) becomes even more relevant now. Without it, we need to repeat the top two lines everywhere. I'll get on it.

pmeier · 2023-03-08T10:21:24Z

scripts/download_model_urls.py

@@ -0,0 +1,41 @@
+import asyncio


This file is a implementation of

vision/.circleci/config.yml

Lines 171 to 189 in 5850f37

download_model_weights:

parameters:

extract_roots:

type: string

default: "torchvision/models"

background:

type: boolean

default: true

steps:

- apt_install:

args: parallel wget

descr: Install download utilitites

- run:

name: Download model weights

background: << parameters.background >>

command: |

mkdir -p ~/.cache/torch/hub/checkpoints

python scripts/collect_model_urls.py << parameters.extract_roots >> \

| parallel -j0 'wget --no-verbose -O ~/.cache/torch/hub/checkpoints/`basename {}` {}\?source=ci'

in Python. The old version relied on wget and parallel installed through apt, but they are not available through conda.

One difference is that this PR uses async downloads, while the old version used multiprocessing. It seems async is roughly 5x slower:

multiprocessing 1m 2s: https://app.circleci.com/pipelines/github/pytorch/vision/23876/workflows/d96da5f3-9ca0-4615-9c08-0373c00233a0/jobs/1849889

async 5m 2s: https://github.com/pytorch/vision/actions/runs/4363141530/jobs/7628874383#step:10:776

I'll try multiprocessing and see if this actually is the root cause or this just comes from the environment change between CircleCI and GHA.

I've tried multiprocessing with threads in 5d6f391. The run aborted to a MemoryError. From the logs we can see though that it also took over 5 minutes: https://github.com/pytorch/vision/actions/runs/4364016074/jobs/7630816354#step:10:894

Thus, I would go with the async solution since that worked. I'm no expert in async / multiprocessing though. If someone sees possible perf improvements for either implementations, feel free to suggest.

I've tried the solution with wget and parallel on GHA and it seems it is really the env that is causing the slowdown:

$ time python scripts/collect_model_urls.py torchvision/models/ | parallel -j0 'wget --no-verbose -O foo/`basename {}` {}\?source=ci' [...] real 5m0.152s user 0m49.044s sys 1m10.467s

Meaning, I'm totally fine using the async solution.

This reverts commit 5d6f391.

osalpekar

Awesome!

.github/workflows/test-linux.yml

This reverts commit 3a3b300.

github-actions · 2023-03-08T21:30:11Z

Hey @pmeier!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Reviewed By: vmoens Differential Revision: D44416639 fbshipit-source-id: a1a088a1a8e04a38889652c1316ab00bb3f8f2ea

port special tests from CircleCI to GHA

1c8fdf9

pmeier added the module: ci label Mar 7, 2023

facebook-github-bot added the cla signed label Mar 7, 2023

pmeier added 6 commits March 7, 2023 17:36

install pytest

18fbc1c

add script to async download model urls

0daaf9c

actually run ONNX tests

f800fa5

fix path in model weights download

19b7607

increase timeout, add total time, and add request source

7e94344

expand bash strict mode

5d26aad

pmeier commented Mar 8, 2023

View reviewed changes

pmeier added 6 commits March 8, 2023 12:52

try download with multiprocessing with threads

5d6f391

remove torch.hub tests

933a78b

Revert "try download with multiprocessing with threads"

4909c84

This reverts commit 5d6f391.

Merge branch 'main' into special-tests

85e0b08

fix macos script location

2b94a36

cleanup

3cafef2

pmeier marked this pull request as ready for review March 8, 2023 12:46

pmeier requested a review from osalpekar March 8, 2023 12:46

osalpekar approved these changes Mar 8, 2023

View reviewed changes

.github/workflows/test-linux.yml Show resolved Hide resolved

pmeier added 2 commits March 8, 2023 22:16

try bash -x

3a3b300

Revert "try bash -x"

e670854

This reverts commit 3a3b300.

pmeier mentioned this pull request Mar 8, 2023

Surface failing tests on GHA #7364

Merged

pmeier merged commit e59cf64 into pytorch:main Mar 8, 2023

pmeier deleted the special-tests branch March 8, 2023 21:29

This was referenced Mar 9, 2023

CircleCI to GitHub Actions tracker #7405

Closed

port lint workflows from CircleCI to GHA #7401

Merged

pmeier mentioned this pull request Mar 27, 2023

kill ONNX / extended unittest workflows on CircleCI #7467

Merged

facebook-github-bot pushed a commit that referenced this pull request Mar 30, 2023

[fbsync] port special tests from CircleCI to GHA (#7396)

a69a684

Reviewed By: vmoens Differential Revision: D44416639 fbshipit-source-id: a1a088a1a8e04a38889652c1316ab00bb3f8f2ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

port special tests from CircleCI to GHA #7396

port special tests from CircleCI to GHA #7396

Uh oh!

pmeier commented Mar 7, 2023 •

edited

Loading

Uh oh!

pmeier Mar 8, 2023

Uh oh!

pmeier Mar 8, 2023

Uh oh!

pmeier Mar 8, 2023

Uh oh!

pmeier Mar 8, 2023

Uh oh!

pmeier Mar 8, 2023

Uh oh!

osalpekar left a comment

Uh oh!

Uh oh!

github-actions bot commented Mar 8, 2023

Uh oh!

Uh oh!

	unittest:
	jobs:
	- unittest_torchhub
	- unittest_onnx
	- unittest_extended

	unittest_torchhub:
	docker:
	- image: cimg/python:3.8
	steps:
	- checkout
	- install_torchvision
	- run_tests_selective:
	file_or_dir: test/test_hub.py

	unittest_onnx:
	docker:
	- image: cimg/python:3.8
	steps:
	- checkout
	- install_torchvision
	- pip_install:
	args: onnx onnxruntime
	descr: Install ONNX
	- run_tests_selective:
	file_or_dir: test/test_onnx.py

	unittest_extended:
	docker:
	- image: cimg/python:3.8
	resource_class: xlarge
	steps:
	- checkout
	- download_model_weights
	- install_torchvision
	- run:
	name: Enable extended tests
	command: echo 'export PYTORCH_TEST_WITH_EXTENDED=1' >> $BASH_ENV
	- run_tests_selective:
	file_or_dir: test/test_extended_*.py

	# In environments without onnxruntime we prefer to
	# invoke all tests in the repo and have this one skipped rather than fail.
	onnxruntime = pytest.importorskip("onnxruntime")

	run_if_test_with_extended = pytest.mark.skipif(
	os.getenv("PYTORCH_TEST_WITH_EXTENDED", "0") != "1",
	reason="Extended tests are disabled by default. Set PYTORCH_TEST_WITH_EXTENDED=1 to run them.",
	)

	download_model_weights:
	parameters:
	extract_roots:
	type: string
	default: "torchvision/models"
	background:
	type: boolean
	default: true
	steps:
	- apt_install:
	args: parallel wget
	descr: Install download utilitites
	- run:
	name: Download model weights
	background: << parameters.background >>
	command: \|
	mkdir -p ~/.cache/torch/hub/checkpoints
	python scripts/collect_model_urls.py << parameters.extract_roots >> \
	\| parallel -j0 'wget --no-verbose -O ~/.cache/torch/hub/checkpoints/`basename {}` {}\?source=ci'

port special tests from CircleCI to GHA #7396

port special tests from CircleCI to GHA #7396

Uh oh!

Conversation

pmeier commented Mar 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

pmeier Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

pmeier Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

pmeier Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

pmeier Mar 8, 2023

Choose a reason for hiding this comment

Uh oh!

osalpekar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 8, 2023

Uh oh!

Uh oh!

pmeier commented Mar 7, 2023 •

edited

Loading