-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Make ViTPooler configurable #36517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ViTPooler configurable #36517
Conversation
…vation function and the number of channels in the output
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
Hey @sebbaur, feel free to ping me when it's ready for review! Btw, we have activation functions defined in |
|
Thanks for the prompt reply and the pointers! I hadn't noticed Once the CI is happy, I will ping you |
|
I had to modify files in dpt and deit too to make CI happy |
|
tests are not done but I suppose they will pass |
|
@qubvel I think it is ready -- it took more work than I initially expected because I had to propagate changes to other parts of the codebase which use the same pooler. This felt a bit out of scope, but I suppose this is WAI? |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me, just a question re docs:
|
Somebody suggested here #36513 that this change was not appropriate I am a bit confused why -- it is a no-op and just makes some hardcoded hyperparameters configurable |
|
Hey! That was me but on reflection I think it's okay - we do have a general rule in our philosophy against adding additional features to existing pretrained model architectures like this, but when it's compact and backward compatibility is guaranteed I think it's probably okay, especially if the PR is already complete. |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sebbaur, thanks for making CI happy, just the last nit comment
|
@qubvel it says |
|
@qubvel @Rocketknight1 friendly ping :) -- do you know who should review the workflows? |
|
@sebbaur the additional workflows are the documentation builder - it's usually not essential to run those. I enabled them anyway, but at this point we just need core maintainer approval and we're good to merge cc @ArthurZucker @Cyrilvallez |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
hello @ArthurZucker and @Cyrilvallez, would you be able to take a look? |
|
friendly ping :) |
|
Happy Monday @ArthurZucker and @Cyrilvallez! Would you please be able to take a quick look? That would unlock a few things on my side Thanks! |
|
Hello @Cyrilvallez ! Could you please take a look? Even if this cannot be merged, it would be very helpful for me to know this sooner than later, as I am planning to open-source a Pytorch implementation of https://huggingface.co/google/hear, which this PR allows easily. I need time to find an alternative if this is not ok |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Sorry for the delay! @Rocketknight1 is right that we usually don't like to change modeling code so much, but let's make an exception here!
However, let's still minimize impact and removed unecesary changes, as from my understanding you only need vit:
- Ijepa should never have been impacted
- let's juste remove "Copied from" for deit and dpt instead of modifying them, and revert the changes to modeling and configs
- flax change of vit is wrong -> let's make sure it works by adding
tanhin the mapping inmodeling_flax_utils
| ) | ||
| self.activation = ACT2FN[self.config.pooler_act] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"tanh" is a KeyError here, it does not exist in the mapping
|
For the "Copied from", you can replace them with something such as "Originally copied from", it will stop our linter from forcing the changes, and at the same time we still keep track of the fact it was previously similar, in case we need it later |
|
Alright, after discussions with @qubvel, let's keep the changes that were propagated to the other models, it's fine! You just need to fix the issue on flax vit then 😉 Sorry for the roller coaster! 🙃 |
|
Thanks for your review! I have made the change to the ACT2FN dict -- good catch! Next time I will know that the models' code shouldn't be updated... Sorry about this |
|
Merging! Thanks!! |
* Make ViT Pooler configurable, so that it is possible to pick the activation function and the number of channels in the output * Add documentation and allow functions as activations (instead of just string) * formatting change * Use ACT2FN * Formatting change * Formatting changes * force pooler_act to be string * force pooler_act to be string * Add configs to OBJECTS_TO_IGNORE to make check_docstrings happy * Making the same change in ijepa to make check_modular_conversion happy * Add IJepaConfig to make CI happy * rename pooler_size to pooler_output_size as defined in the config * typo * revert change to ignore variable * Ran utils/check_docstrings.py --fix_and_overwrite * revert unrelated change * remove redundant defaults * rename self.act -> self.activation * tanh activation function in mapping
What does this PR do?
Make parameters of ViTPooler (activation function, output size) configurable, so that I can open-source https://arxiv.org/abs/2403.02522 more easily (the encoder slightly differs from the current implementation).
Fixes # (issue)
#36513
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Note that the code didn't have any test to the best of my knowledge, so I have not added any. Besides, the change is trivial so it may not be needed.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.