-
Notifications
You must be signed in to change notification settings - Fork 7.1k
port special tests from CircleCI to GHA #7396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Prepare conda | ||
CONDA_PATH=$(which conda) | ||
eval "$(${CONDA_PATH} shell.bash hook)" | ||
conda activate ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@osalpekar #7189 (comment) becomes even more relevant now. Without it, we need to repeat the top two lines everywhere. I'll get on it.
@@ -0,0 +1,41 @@ | |||
import asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a implementation of
Lines 171 to 189 in 5850f37
download_model_weights: | |
parameters: | |
extract_roots: | |
type: string | |
default: "torchvision/models" | |
background: | |
type: boolean | |
default: true | |
steps: | |
- apt_install: | |
args: parallel wget | |
descr: Install download utilitites | |
- run: | |
name: Download model weights | |
background: << parameters.background >> | |
command: | | |
mkdir -p ~/.cache/torch/hub/checkpoints | |
python scripts/collect_model_urls.py << parameters.extract_roots >> \ | |
| parallel -j0 'wget --no-verbose -O ~/.cache/torch/hub/checkpoints/`basename {}` {}\?source=ci' |
in Python. The old version relied on wget
and parallel
installed through apt
, but they are not available through conda
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference is that this PR uses async downloads, while the old version used multiprocessing. It seems async is roughly 5x slower:
- multiprocessing 1m 2s: https://app.circleci.com/pipelines/github/pytorch/vision/23876/workflows/d96da5f3-9ca0-4615-9c08-0373c00233a0/jobs/1849889
- async 5m 2s: https://github.com/pytorch/vision/actions/runs/4363141530/jobs/7628874383#step:10:776
I'll try multiprocessing and see if this actually is the root cause or this just comes from the environment change between CircleCI and GHA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried multiprocessing with threads in 5d6f391. The run aborted to a MemoryError
. From the logs we can see though that it also took over 5 minutes: https://github.com/pytorch/vision/actions/runs/4364016074/jobs/7630816354#step:10:894
Thus, I would go with the async solution since that worked. I'm no expert in async / multiprocessing though. If someone sees possible perf improvements for either implementations, feel free to suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried the solution with wget
and parallel
on GHA and it seems it is really the env that is causing the slowdown:
$ time python scripts/collect_model_urls.py torchvision/models/ | parallel -j0 'wget --no-verbose -O foo/`basename {}` {}\?source=ci'
[...]
real 5m0.152s
user 0m49.044s
sys 1m10.467s
Meaning, I'm totally fine using the async solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
This reverts commit 3a3b300.
Hey @pmeier! You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py |
Reviewed By: vmoens Differential Revision: D44416639 fbshipit-source-id: a1a088a1a8e04a38889652c1316ab00bb3f8f2ea
Edit: #7399 removed the
torch.hub
tests from CI in general. Thus, the points below referring to that are moot.Per title. This refers to
vision/.circleci/config.yml
Lines 1258 to 1262 in 5850f37
or in more details
vision/.circleci/config.yml
Lines 327 to 360 in 5850f37
These tests run only on CPU and a Linux box and are thus outside of the regular unittests.
Although we don't exclude them explicitly in
pytest.ini
these tests are not run with the regular unittests:vision/test/test_hub.py
Line 20 in 5850f37
torchvision
will be imported by almost any other test module during collectionvision/test/test_onnx.py
Lines 16 to 18 in 5850f37
vision/test/test_extended_models.py
Lines 14 to 17 in 5850f37
Same deal as for the other migrations here: let's run the CircleCI and GHA tests in parallel for a few weeks and if nothing comes up, we can remove the ones on CircleCI.
cc @seemethere