Skip to content

Conversation

@zucchini-nlp
Copy link
Member

@zucchini-nlp zucchini-nlp commented Apr 18, 2025

What does this PR do?

FA2 with layer norm upcasted was failing for all VLMs before. The issue lies in VLM config structure which is composite, and modifying base config doesn't update sub-configs (which are also deepcopied in model with XXX._from_config)

The solution is to apply recursively config update on model's children, whenever a PretrainedModel is found. Though I am not sure if we need any recursive calls when prepare_model_for_quantization, it doesn't seem to update config

@github-actions github-actions bot marked this pull request as draft April 18, 2025 09:15
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@zucchini-nlp zucchini-nlp marked this pull request as ready for review April 18, 2025 09:15
@zucchini-nlp
Copy link
Member Author

run-slow: janus

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/janus']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp changed the title [janus] fix tests [VLMs] fix flash-attention tests Apr 18, 2025
@zucchini-nlp zucchini-nlp requested a review from SunMarc April 18, 2025 10:49
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM ! We pop_pre_quantization_dtype when saving the config, this is why you don't see it after saving the model. I think we should also update it recursively !

@zucchini-nlp zucchini-nlp merged commit 1cfcbfc into huggingface:main Apr 24, 2025
20 checks passed
@BenjaminBossan
Copy link
Member

Hi @zucchini-nlp this PR introduced a regression in PEFT. Before the PR, the model card would look like this:

---
library_name: peft
base_model: facebook/opt-125m
---

# Model Card for Model ID
[...]

Afterwards, the base_model meta info is missing:

---
library_name: peft
---

# Model Card for Model ID
[...]

Here is a reproducer:

# testfile.py
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model

model_id = "facebook/opt-125m"

def test_save_peft_modelcard(tmp_path):
    model = AutoModelForCausalLM.from_pretrained(model_id)
    config = LoraConfig()
    model = get_peft_model(model, config)
    model.save_pretrained(tmp_path, safe_serialization=False)

    with open(f"{tmp_path}/README.md") as f:
        model_card = f.read()
    assert "base_model: facebook/opt-125m" in model_card

Comment on lines -866 to -867
if "_name_or_path" in serializable_config_dict:
del serializable_config_dict["_name_or_path"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably due to that, we don't remove this in to_dict() cc @zucchini-nlp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sorry, will revert that

SunMarc pushed a commit that referenced this pull request Apr 28, 2025
zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* fix one test

* fa2 ln test

* remove keys from config recursively

* fix

* fixup
zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants