Skip to content

Conversation

@arjunaskykok
Copy link
Contributor

What does this PR do?

Loading a PyTorch model from a saved TensorFlow checkpoint using
from_pretrained(..., from_tf=True) could fail with a
RuntimeError: size mismatch. The error indicated that weights like
position embeddings were expected to have the shape of word embeddings
(e.g., [vocab_size, hidden_size]).

This issue was triggered by recent changes that defaulted to initializing
the PyTorch model with meta tensors (init_empty_weights) during this
conversion process.

The root cause was in the tied weight handling logic within
load_tf2_state_dict_in_pytorch_model in modeling_tf_pytorch_utils.py.
Multiple distinct parameters initialized as meta tensors can share the same
data_ptr() == 0. The existing logic incorrectly identified these as tied
weights and reused the tensor loaded for the first parameter encountered
with data_ptr() == 0 (often the word embeddings) for subsequent parameters
that also had data_ptr() == 0.

This fix modifies the tied weight check to explicitly skip cases where
pt_weight.data_ptr() == 0, preventing the incorrect reuse of tensors
for distinct meta parameters and resolving the size mismatch error.

Includes a unit test in test_modeling_utils.py to specifically verify
this scenario using from_pretrained(..., from_tf=True) with meta initialization.

Fixes #37786

Who can review?

@Rocketknight1 @gante

@github-actions github-actions bot marked this pull request as draft May 8, 2025 10:17
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@arjunaskykok arjunaskykok marked this pull request as ready for review May 8, 2025 11:08
@github-actions github-actions bot requested review from Rocketknight1 and ydshieh May 8, 2025 11:09
@Rocketknight1 Rocketknight1 added the TensorFlow Anything TensorFlow label May 8, 2025
Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for pinning the issue and fixing it 🤗

@gante
Copy link
Contributor

gante commented May 8, 2025

@arjunaskykok to make our CI happy:

  • run make fixup on your terminal, inside the transformers root folder (this runs automated code formatting)
  • commit the changes

@arjunaskykok
Copy link
Contributor Author

@gante Will do! But need more time, since I'm on Windows and using 'make' is a bit tricky on Windows.

& 'C:\Program Files (x86)\GnuWin32\bin\make.exe' fixup
-n was unexpected at this time.
make: *** [modified_only_fixup] Error 255

@manueldeprada
Copy link
Contributor

Thanks a lot @arjunaskykok !! Also LGTM and now the CI is happy after merging recent main changes. Merging this!

@manueldeprada manueldeprada enabled auto-merge (squash) May 10, 2025 11:01
@manueldeprada manueldeprada merged commit 716819b into huggingface:main May 10, 2025
20 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

TensorFlow Anything TensorFlow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading a Pytorch model from a Tensorflow saved model doesn't work

5 participants