Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Mar 13, 2025

What does this PR do?

Adds pytest-rerunfailures to our push CI. This should help stabilize our CI and allow faster dev cycles 🤗

In a nutshell, if a test failure has a substring in the newly added FLAKY_TEST_FAILURE_PATTERNS variable (.circleci/create_circleci_config.py), that test is repeated up to 5 times. FLAKY_TEST_FAILURE_PATTERNS in this PR includes the following categories of failures:

  1. 'OSError' / 'Timeout' -- generic transient machine/connection errors
  2. 'HTTPError.*502' / 'HTTPError.*504' -- hub HTTP failures
  3. "AssertionError: Tensor-likes are not close!" -- unlucky torch.testing.assert_close cases
  4. [several] -- exceptions thrown after a hub failure handling

Thank you for the suggestion @Wauplin 💛


Follow up: remove some uses of @is_flaky(), since they serve the same purpose:

  1. Tests with old @is_flaky() additions that I couldn't reproduce with --flake-finder --flake-runs 1000
  2. Tests with @is_flaky() that were tagged because one of the criteria in the list above

See diff in this commit


Corresponding PR for docker image update: #36427

@github-actions github-actions bot marked this pull request as draft March 13, 2025 10:55
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@gante gante marked this pull request as ready for review March 13, 2025 10:55
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante requested review from Rocketknight1 and ydshieh March 13, 2025 13:48
@gante gante force-pushed the pytest_rerunfailures branch from bc0e16c to 22a4df5 Compare March 13, 2025 14:04
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little confused about this line - is this to ensure the tests get fixed now rather than waiting for a release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is to build the dev image correctly :)

Previous version of these dockerfiles: we installed from pypi's transformers, the latest release.
This version: we install from the latest commit in the REF branch, REF being the branch that pushes a commit.

In other words, these images were not affected by the changes in setup.py until a release, even when building new dev images. This pattern exists in all docker images, but was missing on these. As a result, before the changes in the dockerfiles, CI crashed because these images didn't have the new testing requirement :)

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add pytest-rerunfailures to

extras["testing"] = (

and make sure all docker files (used for circleci test jobs) have testing.

Am I right ?

Nice for

git+https://github.com/huggingface/transformers.git@${REF}

@gante
Copy link
Contributor Author

gante commented Mar 13, 2025

I think we need to add pytest-rerunfailures to extras["testing"]

@ydshieh good catch! I added it to the PR that builds the dev image, but forgot to add it here. Added ✅

@gante
Copy link
Contributor Author

gante commented Mar 13, 2025

image pushed in #36427 , merging

@gante gante merged commit a3201ce into huggingface:main Mar 13, 2025
23 checks passed
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks all for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants