fix derived berts `_init_weights` #37341

Cyrilvallez · 2025-04-07T11:07:15Z

What does this PR do?

As per the title. We prioritize this family for now as they seem to have corrupted weights on the hub, resulting in bad inits (see #37070 as well). They are also used in optimum's tests!

github-actions · 2025-04-07T11:07:29Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

HuggingFaceDocBuilderDev · 2025-04-07T11:33:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

IlyasMoutawwakil · 2025-04-07T12:45:51Z

Thanks !
Can you please also add : ["poolformer", "dpt", "roformer", "mpnet", "deberta", "deberta_v2", "big_bird"]

2025-04-07T11:54:59.3014635Z =========================== short test summary info ============================
2025-04-07T11:54:59.3015409Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_02_big_bird - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3015420Z 
2025-04-07T11:54:59.3015532Z Mismatched elements: 36 / 453222 (0.0%)
2025-04-07T11:54:59.3015796Z Greatest absolute difference: 1.6531025742728464e+31 at index (0, 0, 2) (up to 0.0001 allowed)
2025-04-07T11:54:59.3016081Z Greatest relative difference: 2693914.25 at index (0, 0, 0) (up to 0.0001 allowed)
2025-04-07T11:54:59.3016599Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_06_deberta - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3016607Z 
2025-04-07T11:54:59.3016715Z Mismatched elements: 15360 / 15360 (100.0%)
2025-04-07T11:54:59.3016973Z Greatest absolute difference: 2.5342160989229478e+26 at index (0, 0, 463) (up to 0.0001 allowed)
2025-04-07T11:54:59.3017208Z Greatest relative difference: 9.20669464448467e+19 at index (0, 4, 110) (up to 0.0001 allowed)
2025-04-07T11:54:59.3017783Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_07_deberta_v2 - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3017789Z 
2025-04-07T11:54:59.3017904Z Mismatched elements: 1152007 / 1152009 (100.0%)
2025-04-07T11:54:59.3018102Z Greatest absolute difference: nan at index (0, 0, 65) (up to 0.0001 allowed)
2025-04-07T11:54:59.3018295Z Greatest relative difference: nan at index (0, 0, 65) (up to 0.0001 allowed)
2025-04-07T11:54:59.3018813Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_12_mobilebert - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3018822Z 
2025-04-07T11:54:59.3018927Z Mismatched elements: 15168 / 26976 (56.2%)
2025-04-07T11:54:59.3019123Z Greatest absolute difference: nan at index (0, 0, 2) (up to 0.0001 allowed)
2025-04-07T11:54:59.3019309Z Greatest relative difference: nan at index (0, 0, 2) (up to 0.0001 allowed)
2025-04-07T11:54:59.3019814Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_13_mpnet - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3019819Z 
2025-04-07T11:54:59.3019921Z Mismatched elements: 26831 / 27000 (99.4%)
2025-04-07T11:54:59.3020175Z Greatest absolute difference: 1.6455917599407446e+31 at index (0, 0, 466) (up to 0.0001 allowed)
2025-04-07T11:54:59.3020395Z Greatest relative difference: 22199113728.0 at index (0, 5, 2) (up to 0.0001 allowed)
2025-04-07T11:54:59.3020908Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_16_roformer - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3020973Z 
2025-04-07T11:54:59.3021073Z Mismatched elements: 9 / 450000 (0.0%)
2025-04-07T11:54:59.3021296Z Greatest absolute difference: 523218419712.0 at index (0, 0, 0) (up to 0.0001 allowed)
2025-04-07T11:54:59.3021505Z Greatest relative difference: 323862.34375 at index (0, 0, 0) (up to 0.0001 allowed)
2025-04-07T11:54:59.3022028Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_compare_to_transformers_17_squeezebert - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3022033Z 
2025-04-07T11:54:59.3022136Z Mismatched elements: 26976 / 26976 (100.0%)
2025-04-07T11:54:59.3022330Z Greatest absolute difference: nan at index (0, 0, 0) (up to 0.0001 allowed)
2025-04-07T11:54:59.3022516Z Greatest relative difference: nan at index (0, 0, 0) (up to 0.0001 allowed)
2025-04-07T11:54:59.3023102Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForMaskedLMIntegrationTest::test_pipeline_ort_model_12_mobilebert - AssertionError: nan not greater than or equal to 0.0
2025-04-07T11:54:59.3023718Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForImageClassificationIntegrationTest::test_pipeline_ort_model_11_poolformer - AssertionError: nan not greater than or equal to 0.0
2025-04-07T11:54:59.3024263Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForSemanticSegmentationIntegrationTest::test_compare_to_transformers_1_dpt - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3024268Z 
2025-04-07T11:54:59.3024369Z Mismatched elements: 2048 / 2048 (100.0%)
2025-04-07T11:54:59.3024639Z Greatest absolute difference: 1.6033836527025424e+34 at index (0, 0, 10, 20) (up to 0.0001 allowed)
2025-04-07T11:54:59.3024924Z Greatest relative difference: 1.0000416040420532 at index (0, 1, 10, 20) (up to 0.0001 allowed)
2025-04-07T11:54:59.3025590Z FAILED tests/onnxruntime/test_modeling.py::ORTModelForImageClassificationIntegrationTest::test_compare_to_transformers_11_poolformer - AssertionError: Tensor-likes are not close!
2025-04-07T11:54:59.3025605Z 
2025-04-07T11:54:59.3025703Z Mismatched elements: 2 / 2 (100.0%)
2025-04-07T11:54:59.3025924Z Greatest absolute difference: 8058407156187136.0 at index (0, 1) (up to 0.0001 allowed)
2025-04-07T11:54:59.3026132Z Greatest relative difference: 1692201984.0 at index (0, 1) (up to 0.0001 allowed)

* fix derived berts * more * roformer

fix derived berts

910f8be

github-actions bot marked this pull request as draft April 7, 2025 11:07

Cyrilvallez added the for patch Tag issues / labels that should be included in the next patch label Apr 7, 2025

Cyrilvallez marked this pull request as ready for review April 7, 2025 11:07

github-actions bot requested review from ArthurZucker and eustlb April 7, 2025 11:07

more

5214bda

ArthurZucker approved these changes Apr 7, 2025

View reviewed changes

roformer

fffd12d

Cyrilvallez merged commit 22065bd into main Apr 7, 2025
17 checks passed

Cyrilvallez deleted the fix-derived-bert-family branch April 7, 2025 16:25

vasqu pushed a commit to vasqu/transformers that referenced this pull request Apr 7, 2025

fix derived berts _init_weights (huggingface#37341)

21f3499

* fix derived berts * more * roformer

ArthurZucker pushed a commit that referenced this pull request Apr 7, 2025

fix derived berts _init_weights (#37341)

04c0ced

* fix derived berts * more * roformer

cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025

fix derived berts _init_weights (huggingface#37341)

8d4ac92

* fix derived berts * more * roformer

yaswanth19 mentioned this pull request Apr 18, 2025

Add Aimv2 model #36625

Merged

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

fix derived berts _init_weights (huggingface#37341)

4546376

* fix derived berts * more * roformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix derived berts `_init_weights` #37341

fix derived berts `_init_weights` #37341

Uh oh!

Cyrilvallez commented Apr 7, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

IlyasMoutawwakil commented Apr 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix derived berts _init_weights #37341

fix derived berts _init_weights #37341

Uh oh!

Conversation

Cyrilvallez commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

IlyasMoutawwakil commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix derived berts `_init_weights` #37341

fix derived berts `_init_weights` #37341

Cyrilvallez commented Apr 7, 2025 •

edited

Loading

IlyasMoutawwakil commented Apr 7, 2025 •

edited

Loading