Skip to content

Conversation

@Isotr0py
Copy link
Contributor

What does this PR do?

Fixes #33389 (comment)

  • Add GGUF support to T5-encoder
  • Add model conversion test

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Signed-off-by: Isotr0py <[email protected]>
@Isotr0py Isotr0py changed the title Add GGUF support to t5encoder Add GGUF support to T5-Encoder Mar 13, 2025
@MekkCyber
Copy link
Contributor

Thanks @Isotr0py SG 🔥 ! Just have a small question

@SunMarc SunMarc marked this pull request as ready for review March 13, 2025 15:43
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

Comment on lines 4232 to 4233
if gguf_file is not None:
model_kwargs.pop("gguf_file")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the issue here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use T5EncoderModel to initialize the model class instead of AutoModel:

text_encoder_2 = T5EncoderModel.from_pretrained(
    t5_gguf,
    gguf_file=t5_file,
    torch_dtype=torch.bfloat16,
)

Then there will raise an error because gguf_file appeared in model_kwargs:

Traceback (most recent call last):
  File "/data/develop-projects/github-repos/transformers/check_t5.py", line 13, in <module>
    text_encoder_2 = T5EncoderModel.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 272, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 4377, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: T5EncoderModel.__init__() got an unexpected keyword argument 'gguf_file'

I suspect the root issue is in another place, but haven't figured it out yet. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm this happens because you passed gguf_file in cls.config_class.from_pretrained( but indeed this is strange since we should pass gguf_file inside that method from the start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, if we don't pass gguf_file in cls.config_class.from_pretrained(, it will return default T5 config causing parameter shape mismatched:

Traceback (most recent call last):
  File "/data/develop-projects/github-repos/transformers/check_t5.py", line 13, in <module>
    text_encoder_2 = T5EncoderModel.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 272, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 4430, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 4859, in _load_pretrained_model
    disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/data/develop-projects/github-repos/transformers/src/transformers/modeling_utils.py", line 835, in _load_state_dict_into_meta_model
    module.load_state_dict(
  File "/data/develop-projects/github-repos/transformers/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for Linear:
        size mismatch for weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([512, 512]).

Copy link
Member

@SunMarc SunMarc Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is bit strange that we had to do this only now. Merging for now !

@SunMarc SunMarc merged commit b070025 into huggingface:main Mar 13, 2025
23 checks passed
@Isotr0py Isotr0py deleted the t5encoder-gguf branch March 17, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants