fix(conversion): Fix size mismatch error during TF->PT model loading #38014

arjunaskykok · 2025-05-08T10:17:48Z

What does this PR do?

Loading a PyTorch model from a saved TensorFlow checkpoint using
from_pretrained(..., from_tf=True) could fail with a
RuntimeError: size mismatch. The error indicated that weights like
position embeddings were expected to have the shape of word embeddings
(e.g., [vocab_size, hidden_size]).

This issue was triggered by recent changes that defaulted to initializing
the PyTorch model with meta tensors (init_empty_weights) during this
conversion process.

The root cause was in the tied weight handling logic within
load_tf2_state_dict_in_pytorch_model in modeling_tf_pytorch_utils.py.
Multiple distinct parameters initialized as meta tensors can share the same
data_ptr() == 0. The existing logic incorrectly identified these as tied
weights and reused the tensor loaded for the first parameter encountered
with data_ptr() == 0 (often the word embeddings) for subsequent parameters
that also had data_ptr() == 0.

This fix modifies the tied weight check to explicitly skip cases where
pt_weight.data_ptr() == 0, preventing the incorrect reuse of tensors
for distinct meta parameters and resolving the size mismatch error.

Includes a unit test in test_modeling_utils.py to specifically verify
this scenario using from_pretrained(..., from_tf=True) with meta initialization.

Fixes #37786

Who can review?

@Rocketknight1 @gante

github-actions · 2025-05-08T10:17:59Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

gante

LGTM! Thank you for pinning the issue and fixing it 🤗

gante · 2025-05-08T14:51:13Z

@arjunaskykok to make our CI happy:

run make fixup on your terminal, inside the transformers root folder (this runs automated code formatting)
commit the changes

arjunaskykok · 2025-05-09T05:02:25Z

@gante Will do! But need more time, since I'm on Windows and using 'make' is a bit tricky on Windows.

& 'C:\Program Files (x86)\GnuWin32\bin\make.exe' fixup
-n was unexpected at this time.
make: *** [modified_only_fixup] Error 255

manueldeprada · 2025-05-10T11:00:16Z

Thanks a lot @arjunaskykok !! Also LGTM and now the CI is happy after merging recent main changes. Merging this!

HuggingFaceDocBuilderDev · 2025-05-10T11:11:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…uggingface#38014)

fix(conversion): Fix size mismatch error during TF->PT model loading

4367db3

github-actions bot marked this pull request as draft May 8, 2025 10:17

arjunaskykok marked this pull request as ready for review May 8, 2025 11:08

Merge branch 'main' into fix_error_loading_state_dict_torch_from_tf

bb8d791

github-actions bot requested review from Rocketknight1 and ydshieh May 8, 2025 11:09

Rocketknight1 added the TensorFlow Anything TensorFlow label May 8, 2025

gante approved these changes May 8, 2025

View reviewed changes

Merge branch 'main' into fix_error_loading_state_dict_torch_from_tf

c96f123

manueldeprada enabled auto-merge (squash) May 10, 2025 11:01

manueldeprada merged commit 716819b into huggingface:main May 10, 2025
20 checks passed

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

fix(conversion): Fix size mismatch error during TF->PT model loading (h…

7b046b8

…uggingface#38014)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(conversion): Fix size mismatch error during TF->PT model loading #38014

fix(conversion): Fix size mismatch error during TF->PT model loading #38014

Uh oh!

arjunaskykok commented May 8, 2025

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

gante left a comment

Uh oh!

gante commented May 8, 2025

Uh oh!

arjunaskykok commented May 9, 2025

Uh oh!

manueldeprada commented May 10, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix(conversion): Fix size mismatch error during TF->PT model loading #38014

fix(conversion): Fix size mismatch error during TF->PT model loading #38014

Uh oh!

Conversation

arjunaskykok commented May 8, 2025

What does this PR do?

Who can review?

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented May 8, 2025

Uh oh!

arjunaskykok commented May 9, 2025

Uh oh!

manueldeprada commented May 10, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented May 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants