Skip to content

cleanup CI config #4983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Nov 30, 2021
Merged

cleanup CI config #4983

merged 24 commits into from
Nov 30, 2021

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Nov 24, 2021

This PR cleans up our CircleCI config in a number of ways:

  • add custom commands such as checkout_and_install or run_tests, which are used in multiple jobs
  • split some jobs like

    vision/.circleci/config.yml

    Lines 200 to 209 in 8d25de7

    - run:
    command: |
    sudo apt-get update -y
    sudo apt install -y libturbojpeg-dev
    pip install --user --progress-bar off mypy
    pip install --user --progress-bar off types-requests
    pip install --user --progress-bar off --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
    pip install --user --progress-bar off git+https://github.com/pytorch/data.git
    pip install --user --progress-bar off --no-build-isolation --editable .
    mypy --config-file mypy.ini

    into multiple steps to be able to see at a glance which part failed
  • regroup jobs into the fitting workflow. For example, currently the build workflow also groups the lint jobs as well as a few unittest jobs. Note that this will not run the workflows multiple times, but is still inconsistent with how workflows are supposed to be used. This is partially caused by this "unconditional condition"

    vision/.circleci/config.yml.in

    Lines 1000 to 1023 in 8d25de7

    {%- if True %}
    jobs:
    - circleci_consistency
    {{ build_workflows(windows_latest_only=True) }}
    - python_lint
    - python_type_check
    - docstring_parameters_sync
    - clang_format
    - torchhub_test
    - torch_onnx_test
    - prototype_test
    {{ ios_workflows() }}
    {{ android_workflows() }}
    unittest:
    jobs:
    {{ unittest_workflows() }}
    cmake:
    jobs:
    {{ cmake_workflows() }}
    nightly:
    {%- endif %}

    that was added without comment in Packaging fixes #1214. Unless this is not overwritten / patched in FBcode, I think it is fine to remove it.
  • add {{ pip_install }} and {{ apt_install }} templates, which will be expanded by regenerate.py to handle permissions and disabling progress output

cc @seemethere

@facebook-github-bot
Copy link

facebook-github-bot commented Nov 24, 2021

💊 CI failures summary and remediations

As of commit 8513fab (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_onnx (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

/home/circleci/project/test/test_onnx.py:381: T...ead to errors or silently give incorrect results).
test/test_onnx.py::TestONNXExporter::test_mask_rcnn
test/test_onnx.py::TestONNXExporter::test_keypoint_rcnn
  /home/circleci/project/torchvision/ops/poolers.py:239: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
    num_rois = len(rois)

test/test_onnx.py::TestONNXExporter::test_roi_heads
  /home/circleci/project/test/test_onnx.py:380: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
    original_image_sizes = [img.shape[-2:] for img in images]

test/test_onnx.py::TestONNXExporter::test_roi_heads
  /home/circleci/project/test/test_onnx.py:381: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
    images = ImageList(images, [i.shape[-2:] for i in images])

test/test_onnx.py::TestONNXExporter::test_roi_heads
test/test_onnx.py::TestONNXExporter::test_faster_rcnn
test/test_onnx.py::TestONNXExporter::test_mask_rcnn
test/test_onnx.py::TestONNXExporter::test_keypoint_rcnn
  /home/circleci/project/torchvision/models/detection/transform.py:293: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
    for s, s_orig in zip(new_size, original_size)

test/test_onnx.py::TestONNXExporter::test_roi_heads

2 failures not recognized by patterns:

Job Step Action
CircleCI binary_libtorchvision_ops_ios_12.0.0_arm64 Update Homebrew 🔁 rerun
CircleCI binary_libtorchvision_ops_ios_12.0.0_x86_64 Update Homebrew 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@pmeier pmeier marked this pull request as ready for review November 24, 2021 09:53
@pmeier
Copy link
Collaborator Author

pmeier commented Nov 24, 2021

Oh well, the ONNX failure is valid and exists since #4692 cc @prabhat00155. Currently we run the tests with python instead of pytest:

python test/test_onnx.py

Since we have

vision/test/test_onnx.py

Lines 586 to 587 in 8d25de7

if __name__ == "__main__":
pytest.main([__file__])

the tests are correctly run, but the process doesn't exit with a non-zero return code so the CI doesn't pick up on the failure.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier , this is helpful. I made a few comments below but this looks good overall, LMK what you think.

Also I marked all of the CodeQL issues as "wontfix". They're all completely unrelated to this PR... not sure why they show up now and more importantly, why here. But ¯_(ツ)_/¯

@pmeier pmeier requested a review from NicolasHug November 24, 2021 16:10
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pmeier

@@ -99,6 +99,81 @@ commands:
- brew_install:
formulae: libtool

apt_install:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight this entire blob does not really look like an obvious improvement. It's hard to figure out what it does, or why it's needed. Same for pip_install I would say.

We might as well just inline everything I think, there would be a bit of repetition, but at least things would remain obvious. My second-best suggestion would be to revert to what you had before, and rely on regenerate.py (although I'm not a fan either).

But I'll leave it up to you to decide.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight this entire blob does not really look like an obvious improvement. It's hard to figure out what it does, or why it's needed.

It handles running as super-user, updating the repository database, confirming choices, and suppressing output for you. Compare

steps:
  - run: sudo apt update -qy && sudo apt install -qy $PACKAGES

with

steps:
  - apt_install:
      args: $PACKAGES

Not having this as a central command means that every call to it will be a little different. In times this means only cluttering the log (forgetting -q), whereas in other situations it fails the job (forgetting sudo).

Same for pip_install I would say.

Same argument. Compare

steps:
  - run: pip install --user --progress-bar=off $PACKAGES

with

steps:
  - pip_install:
      args: $PACKAGES

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, a fairer comparison would be to compare the repeated use of

steps:
    - run: sudo apt update -qy && sudo apt install -qy $PACKAGES

vs

  apt_install:
    parameters:
      args:
        type: string
      descr:
        type: string
        default: ""
      update:
        type: boolean
        default: true
    steps:
      - run:
          name: >
            <<^ parameters.descr >> apt install << parameters.args >> <</ parameters.descr >>
            <<# parameters.descr >> << parameters.descr >>            <</ parameters.descr >>
          command: |
            <<# parameters.update >> sudo apt update -qy  <</ parameters.update >>
            sudo apt install << parameters.args >>

(and the associated cost of understanding and maintaining it) plus the repeated use of

steps:
    - apt_install: args: $PACKAGES

@pmeier pmeier merged commit 65cdaea into pytorch:main Nov 30, 2021
@pmeier pmeier deleted the cleanup-ci branch November 30, 2021 08:29
datumbox added a commit that referenced this pull request Nov 30, 2021
@datumbox
Copy link
Contributor

It's probably not the case, but worth checking that the recent failures on the CI are not the result of this PR. I check this at #5011.

@pmeier
Copy link
Collaborator Author

pmeier commented Nov 30, 2021

@datumbox the ONNX failure was "introduced" by this PR. See #4983 (comment).

@datumbox
Copy link
Contributor

@pmeier Thanks for confirming. I assume you will address on a follow up soon?

@pmeier
Copy link
Collaborator Author

pmeier commented Nov 30, 2021

I hope @prabhat00155 can do this, since the error was only surfaced by this PR, but was introduced in #4692. Since I'm unfamiliar with the topic at hand, I could xfail it to make the CI green again and open an issue for someone else to fix. Otherwise, we could also revert the other PR, since it clearly introduced incorrect behavior.

@datumbox
Copy link
Contributor

@prabhat00155 Could you confirm you plan to pick it up and provide the issue number where we track progress? Just making sure this is tracked as it's high priority.

@prabhat00155
Copy link
Contributor

@prabhat00155 Could you confirm you plan to pick it up and provide the issue number where we track progress? Just making sure this is tracked as it's high priority.

Opened an issue to track this.

facebook-github-bot pushed a commit that referenced this pull request Dec 2, 2021
Summary:
* [DIRTY] cleanup CI config

* remove pip install command

* fix syntax

* fix jobs

* fix syntax

* add support for editable install

* sync

* fix sudo apt install

* fix editable install

* sync

* try pip without user install

* [DIRTY] address review comments

* [DIRTY] try use dynamic name

* switch logic

* address remaining comments

* cleanup

* more cleanup

* fix enabling prototype tests

* Update .circleci/config.yml.in

* linebreak

Reviewed By: NicolasHug

Differential Revision: D32759203

fbshipit-source-id: 1d3149d1e43a503ee6f7cb9e2ac2935587ea23d7

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants