-
Notifications
You must be signed in to change notification settings - Fork 853
🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric #1635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| merged_mask = np.maximum(merged_mask, mask) | ||
|
|
||
| # Binarize masks | ||
| merged_mask = np.minimum(merged_mask, 255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if i need to binarize the masks here or not. The ground-truth masks of the dataset is can vary, e.g., 241, 253, etc, depending on the sub-categories of the defects. I tried to evaluate the dataset using the existing metrics, seems it doesn't effect the evaluation results (with or without this binarization). Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting observation. I think that is a good idea to only have 255 values in any case, since the masks are divided by 255 here:
https://github.com/openvinotoolkit/anomalib/blob/69184e378ea025d1f23ad86ebac1ada53debbe25/src/anomalib/data/base/dataset.py#L130
But I think that this line doesn't exactly do that? Isn't this element wise minimum, for example in case of 245, going to set output to 245, not 255.
I think this should probably be merged_mask = np.where(merged_mask > 0, 255, 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment! ah yes, I think it should be merged_mask = np.where(merged_mask > 0, 255, 0) as you said. I have made the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great.
|
Nice work 😃, thanks for this addition. I didn't thoroughly check all the code, but I did comment on the question you had. |
|
Hi, a good work! Are you using the "V1" branch of this project? I cannot find the path "src/anomalib/data/image" in the main branch. Should I switch to the "V1" branch to use your change? Looking forward to your reply. Thank you! |
@fengchuibeixiang , v1 will be released on the 31 of January |
|
Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold? |
@phcarval thanks for your comment! Yes, I binarized the multi ground truth maps into one and stored it in a separate folder, without removing the original unmerged ground truth maps. This is necessary for the existing metrics. With and without But as you said, for sPRO metric, we need to consider to have a case where the validation/test set also can support the unmerged ones. I was thinking to an add additional column, something like The evaluation of the sPRO metrics can utilize the additional column to access the unmerged maps. It seems i may also need to modify the P.S.: I haven't fully checked how the metrics calculation flow in anomalib, so my idea on using the additional column can be wrong :D |
I think it's a good idea. I think it's pretty close to what @jpcbertoldo did in his PR #553.
I have already thought about adding LOCO myself, but I haven't had time to do it for the moment. I think to properly implement the dataset the following points must be addressed:
I think metrics are loaded somewhere in Thanks anyway for the PR! |
|
Yeah as @phcarval said, there are quite some things to consider. I think that we'd need to think through how to best support logical anomaly datasets as well as supervised datasets in the long run. |
|
@jpcbertoldo, any comments from pimo perspective? |
|
Disclaimer: I read the paper in-depth long ago and just re-skimmed it, so i might miss some details. If I understood correctly, the GT merging is done for the sake of simplifying the use of other metrics. However, (pixel-level) AUROC doesnt make sense for logical anomalies. An example, to make sure we're on the same page: in the image below, each square is supposed to have one pushpin but they all have 2 pushpins.
masks So this image has 15 sPRO curves. Some of them (e.g. like above) have disconnected regions, but they're kind of considered as a single region for the metric (not in the morphological sense) because chosing one pushpin or another is the same ("Additional objects" in section "3.4 Selection of Saturation Thresholds" in the paper talks about this case). So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?
As for PIMO, good news : ) Since the annotations are already split in different files, an analogous sPIMO curve could be defined as (for instance) the average (on the y-axis) of the sPRO curves coming from the same image. |
The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other. |
I vaguely recall doing that indeed, but looking back it doesn't feel like the best idea. Again, i may be missing another detail that made me do that at the time. Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated. That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks. |
I think duplicating the source images means that you risk skewing the image-level metrics though. |
|
@jpcbertoldo Thank you so much for your comments! I've just started working on anomaly detection and anomalib recently, so please understand if my response or concern below maybe wrong and sounds too naive 😅 I think some things need to be clarified.
Initially, I thought merging the masks would be helpful if we wanted to use this dataset with existing metrics. I believed this approach is simple and doesn't require invasive changes that might break many parts of the existing code, which expects a single mask path/map (e.g., get item method, metric calculation, visualization, etc.). I wonder, is there any case where an Anomalib user wants to evaluate the MVTec LOCO dataset using existing metric behaviors (including using connected regions morphologically), even if it isn't entirely suitable? Perhaps just for the sake of experimentation, research, or simple curiosity?
Yes, I think, as @phcarval suggested, which seems similar to your PR on #553, using the sum operation would help encode the information into a single image, avoiding duplicating rows in the samples. This "merging" may also help "connect" the multiple masks within a single row of samples, which I believe is necessary for batch/dataset operations, as I will explain further below.
I think the problem with using a repeated row for the same image is that it will "disconnect" the masks in the batching operation. Since each row is treated as a single instance, the same images with different masks can be separated into different batches. However, don't we need all masks to correspond to each image for one step of metric calculation or visualization of maps? Perhaps image-level metrics, as mentioned by @phcarval, could be relevant. I could be wrong since I am not fully familiar with the metrics calculation yet. I think the issue of "disconnected" masks could occur even if the samples are not shuffled for each step. |
Hmmm... Yeah, you're right. LOCO is complicated indeed haha.
Maybe for debug. I don't see it otherwise because LOCO's annotations are quite case-specific.
I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers. For instance, the example in #1635 (comment) (say there are only those two masks) would be something like merged = 1 * mask_001 + 2 * mask_003so that you are able to recover the individual masks from where you cannot distinguish anymore which regions are from the same mask, so sPRO would be wrong. Not sure we're on the same page but I think it's important to address this. |
That's a very good point I hadn't thought about! Not only do you need to encode for the different defect categories, but you also have to differentiate different masks that have the same value but are loaded from different files. And you somehow need to make it work through Torchmetrics' trainer. LOCO is complicated indeed. |
I'm not sure I understand when that would happen. The one-file-one-integer encoding I proposed was specifically to address this issue. |
|
No I agree with you here. Just making the point that you must differentiate not only the different defect types but also the different map files. |
I see, i will consider this in the implementation, maybe by giving a warning when someone trying to use the metrics on the existing metrics implementation (in morphological way), or just forcing it via assert by not allowing them at all.
Ah, now I understand what you meant here. Based on your code, since the masks were binarized after being read, simply performing a sum would result in everything becoming 0 or 1. That's why the
Oh, because after the encoding the masks to the integers, we lost the information of the original pixel value if there is no mapping to the original pixel value (?). If let say in one image has two different masks with pixel value of 241 and 253, and another image has also two different masks, 231 and 251, the masks for the different images will be encoded into 1 and 2 (?). So connecting to the defect config later would be difficult. Oh no, maybe it is handled somewhere, but I'm not sure, haha. I will also address it~. A sample of config: |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…g from the dataset
8846fd4 to
ccb7599
Compare
|
Thank you for your comments! @jpcbertoldo @phcarval , Based on your comments, I have made the following changes:
Remaining things to do:
|
|
Great job, thanks! Now I haven't tested it all but AUsPRO should definitely be added (as the AU- metrics are widely used in the library, AUPRO, AUROC...). I don't think it's very difficult if you've added sPRO, I believe there is an auc class somewhere in torchmetrics that will do the computation by itself. |
|
Great @willyfh !
I just didn't quite get this one. The saturation parameter of sPRO is passed to the metric object on the setup? Can you point out to that part in the code plz? |
@phcarval I checked the implementation of aupro.py, seems it is quite complicated. the approach doesn't seem similar with pro.py. But ya, i need to understand the implementation first.
@jpcbertoldo The saturation_config which is loaded in the dataset is passed to the metric in this part: where finally this method will be executed, to update the metric which having the Please let me know if you have any concerns. |
|
Hi @samet-akcay , should i rebase these changes to main branch? Or do i need to wait until v.1.1 branch is created later? |
|
Hi @willyfh, sorry merging v1 to main closed all the PRs that were targeting v1 :( If you could create the PR targeting main, we could still merge it I think |





📝 Description
Add the implementation of MVTec LOCO AD dataset [Paper]
The dataset structure could contains multiple ground-truth masks for the corresponding anomalous images. The dataset
_setupmethod will merge the multiple masks and store the merged ground-truth masks into a separate ground-truth directory via the_merge_gt_maskfunction.Note: I used the downloadable dataset link found from this closed PR: #553
This PR does not include the implementation of sPRO metrics, which seems the metric is designed to additionally evaluate MVTec LOCO dataset. I plan to implement the metric in a separate PR later.
🛠️ Fixes #574 #471 #1341
✨ Changes
Select what type of change your PR is:
✅ Checklist
Before you submit your pull request, please make sure you have completed the following steps: