-
Notifications
You must be signed in to change notification settings - Fork 29.2k
Introduce Textnet backbone (needed for Fast model) #27425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Textnet backbone (needed for Fast model) #27425
Conversation
bb5f3b1
to
ab2aad5
Compare
@amyeroberts I have pulled out all the changes for textnet backbone needed for fast model (#26657) into separate PR. Requesting for a first round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding!
I've done an initial pass - it'll likely be at least one more review round before ready for approval. Main comment is about the deletion and addition of model parameters. The priority when adding models to transformers is to make sure that they're easy to read and understand - the current fused kernel logic should be simplified or completely removed. Have you run tests to see compare running times with and without the fused kernels?
Fair point, I just ran the tests. The fused kernel just saves some 5 % time during eval. I am going to remove the complex logic. |
@amyeroberts I have addressed almost all of the feedbacks, I have some questions of few of them. Help me with some more details for them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
Main comment is about how info is stored in the config. Let me know if any of it isn't clear
@amyeroberts The feedbacks on configuration refactoring has been addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for iterating!
Just a few small comments now. Once resolved we'll be good to merge! Main comment is about the logic in the backbone for selecting the returned feature maps
content = response.text | ||
namespace = {} | ||
|
||
exec(content, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not do this - dynamic execution of code is dangerous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, But this config url is a python file, (example) . Is there any other safer way to do this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts Gentle remainder .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there's a safer way of doing this, you could upload the config files to the hub and use hf_hub_download
for instance: https://huggingface.co/docs/huggingface_hub/guides/download#download-a-single-file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem to have been pushed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I have done is, push the config file (python file) into hub and download it using hf_hub_download. My understanding is when file is downloaded from hub, it will be safer to do exec on it. Let me know if my understanding is wrong and this has to fixed in some other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amyeroberts I have removed the dynamic execution, I have preprocessed all the python config files into a json files. Hope this approach is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the original_pixel_values has been prepared and stored in hub, avoid performing preprocessing in conversion script.
aa51ff1
to
1f23b19
Compare
c433099
to
a3427d7
Compare
@amyeroberts All Feedbacks has been incorporated. Please have a look |
@raghavanone Please make sure to look at the diff before asking for review, to catch any obvious things that should be updated. Opening the files tab, the first thing one sees are README's that need to be filled in. |
Oh , the rebase to main should have undone some of these changes . Let me fix and review them again . |
@amyeroberts I have fixed the issues coming from rebase to main and also validated that all the feedbacks are incorporated. Please have a look . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating and adding this!
Just a few nits to address before merging. The checkpoint paths will also need to be updated to go under the organisation
TextNetBackbone, | ||
TextNetForImageClassification, | ||
TextNetModel, | ||
is_torch_available, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to import this here - it's already included in this module (it's used just a few lines above)
is_torch_available, |
@require_torch | ||
@require_vision | ||
class TextNetModelIntegrationTest(unittest.TestCase): | ||
# @slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# @slow | |
@slow |
The num of channels in out for the initial convolution layer. | ||
stem_act_func (`str`, *optional*, defaults to `"relu"`): | ||
The activation function for the initial convolution layer. | ||
image_size (`int`, *optional*, defaults to `[640, 640]`): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Types here don't match.
image_size (`int`, *optional*, defaults to `[640, 640]`): | |
image_size (`int` or `Tuple[int, int]`, *optional*, defaults to `[640, 640]`): |
@amyeroberts Let me know how to update the checkpoint to organisation . |
Hi @raghavan, nice work on getting TextNet across the line! For adding the checkpoints to the hub, I’d suggest reaching out to the lead paper author and asking them if they would be happy to host the model weights on the hub under their profile. As the model appears to have been a collab between universities, there isn’t an obvious org for it to go under. @NielsRogge has more experience here, so will have recommendations for everything that needs to be done. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This is needed for merging fast model #26501