-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Fix tests for vision models #35654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tests for vision models #35654
Conversation
|
run-slow: beit, detr, dinov2, vit, textnet |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @qubvel, I'm not sure what this draft pr intends to do, however it might be relevant to the pr #35138. That pr fixes the incompatibility of FlaxDinov2 with batch sizes of more than 1. This error could not be detected by the flax tests (same as the pytorch tests), when I first contributed this model. All the slow tests in transformers simply pass a single image with a batch size of 1 and that is why such batch sizes incompatibility errors might not be detected. As a result, I changed the images batch size to 2 for the flax dinov2 slow tests (in that pr, not yet merged). Probably, doing the same for all the other future model slow tests would greatly assist the development process. Also, may I request a review on pr #35138 so that FlaxDinov2 can be used properly. |
|
run-slow: beit, detr, dinov2, vit, textnet |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/detr', 'models/dinov2', 'models/textnet', 'models/vit'] ... |
|
run-slow: beit, detr, dinov2, vit, textnet |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/detr', 'models/dinov2', 'models/textnet', 'models/vit'] ... |
eb2a32c to
6a703ee
Compare
|
run-slow: beit, detr, dinov2, vit, textnet |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/detr', 'models/dinov2', 'models/textnet', 'models/vit'] ... |
|
run-slow: beit, data2vec, dpt |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/data2vec', 'models/dpt'] ... |
|
run-slow: detr |
|
This comment contains run-slow, running the specified jobs: ['models/detr'] ... |
|
run-slow: beit, detr, dinov2, vit, textnet, data2vec, dpt |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/data2vec', 'models/detr', 'models/dinov2', 'models/dpt', 'models/textnet', 'models/vit'] ... |
|
run-slow: beit, detr, dinov2, vit, textnet, data2vec, dpt |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/data2vec', 'models/detr', 'models/dinov2', 'models/dpt', 'models/textnet', 'models/vit'] ... |
|
run-slow: beit, data2vec, dpt, zoedepth |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/data2vec', 'models/dpt', 'models/zoedepth'] ... |
| # with interpolate_pos_encoding being False an exception should be raised with higher resolution | ||
| # images than what the model supports. | ||
| self.assertFalse(processor.do_center_crop) | ||
| with torch.no_grad(): | ||
| with self.assertRaises(ValueError, msg="doesn't match model"): | ||
| model(pixel_values, interpolate_pos_encoding=False) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always interpolate, error raising was removed for ZoeDepth in
https://github.com/huggingface/transformers/pull/30136/files#diff-3f84bebd6be8d9c0f5c5068199f5c49eac8489d5fa466fb6fa08b0365e78dba4
that's why we are removing it from tests as well
| if self.position_embeddings is not None: | ||
| if interpolate_pos_encoding: | ||
| cls_tokens = cls_tokens + self.interpolate_pos_encoding(embeddings, height, width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bug, but we probably never reach it because self.position_embeddings is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it's likely no one never use interpolate_pos_encoding=True?
If it is True, would it fail the call at this point or it's just compute a different value of cls_tokens?
BTW, I see
if config.use_absolute_position_embeddings:
self.position_embeddings = nn.Parameter(torch.zeros(1, num_patches + 1, config.hidden_size))
else:
self.position_embeddings = None
So it's possible self.position_embeddings is not None if config.use_absolute_position_embeddings is True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked some popular models on the hub, all have use_absolute_position_embeddings: false (didn't do extensive testing tbh). So it's likely the combination of factors: no one uses use_absolute_position_embeddings=True (should be a new model) + interpolate_pos_encoding=True.
It's a bug introduced with adding interpolate_pos_encoding flag, it should be a embedding = embedding + .. not cls_tokens. But even in that case we will have double interpolation: here and one in BeitPatchEmbedding, so I just cleaned this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, indeed. Thanks!
| def forward( | ||
| self, | ||
| pixel_values: torch.Tensor, | ||
| position_embedding: Optional[torch.Tensor] = None, | ||
| ) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency with other models position_embedding removed from BeitPatchEmbeddings and applied in BeitEmbeddings module, which is a breaking change, but I suppose BeitPatchEmbeddings module is used only as a part of the BeitEmbeddings.
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before looking at the tests, I have some question in the modeling code changes 🙏
| if self.position_embeddings is not None: | ||
| if interpolate_pos_encoding: | ||
| cls_tokens = cls_tokens + self.interpolate_pos_encoding(embeddings, height, width) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it's likely no one never use interpolate_pos_encoding=True?
If it is True, would it fail the call at this point or it's just compute a different value of cls_tokens?
BTW, I see
if config.use_absolute_position_embeddings:
self.position_embeddings = nn.Parameter(torch.zeros(1, num_patches + 1, config.hidden_size))
else:
self.position_embeddings = None
So it's possible self.position_embeddings is not None if config.use_absolute_position_embeddings is True?
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot.
A nit question regarding logger.warning_once() or warnings.warn.
|
run-slow: beit, data2vec, dpt, zoedepth, detr, dinov2, vit, textnet |
|
This comment contains run-slow, running the specified jobs: ['models/beit', 'models/data2vec', 'models/detr', 'models/dinov2', 'models/dpt', 'models/textnet', 'models/vit', 'models/zoedepth'] ... |
|
cc @ArthurZucker for review |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤗 thanks for taking care of our Ci's health!
* Trigger tests * [run-slow] beit, detr, dinov2, vit, textnet * Fix BEiT interpolate_pos_encoding * Fix DETR test * Update DINOv2 test * Fix textnet * Fix vit * Fix DPT * fix data2vec test * Fix textnet test * Update interpolation check * Fix ZoeDepth tests * Update interpolate embeddings for BEiT * Apply suggestions from code review
What does this PR do?
Fixing tests for vision models
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.