🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric #1635

willyfh · 2024-01-13T10:29:53Z

📝 Description

Add the implementation of MVTec LOCO AD dataset [Paper]

The dataset structure could contains multiple ground-truth masks for the corresponding anomalous images. The dataset _setup method will merge the multiple masks and store the merged ground-truth masks into a separate ground-truth directory via the _merge_gt_mask function.

Note: I used the downloadable dataset link found from this closed PR: #553

This PR does not include the implementation of sPRO metrics, which seems the metric is designed to additionally evaluate MVTec LOCO dataset. I plan to implement the metric in a separate PR later.

🛠️ Fixes #574 #471 #1341

✨ Changes

Select what type of change your PR is:

🐞 Bug fix (non-breaking change which fixes an issue)
🔨 Refactor (non-breaking change which refactors the code base)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

willyfh · 2024-01-13T12:13:35Z

src/anomalib/data/image/mvtec_loco.py

+                merged_mask = np.maximum(merged_mask, mask)
+
+        # Binarize masks
+        merged_mask = np.minimum(merged_mask, 255)


Not sure if i need to binarize the masks here or not. The ground-truth masks of the dataset is can vary, e.g., 241, 253, etc, depending on the sub-categories of the defects. I tried to evaluate the dataset using the existing metrics, seems it doesn't effect the evaluation results (with or without this binarization). Please correct me if I'm wrong.

Interesting observation. I think that is a good idea to only have 255 values in any case, since the masks are divided by 255 here:
https://github.com/openvinotoolkit/anomalib/blob/69184e378ea025d1f23ad86ebac1ada53debbe25/src/anomalib/data/base/dataset.py#L130

But I think that this line doesn't exactly do that? Isn't this element wise minimum, for example in case of 245, going to set output to 245, not 255.
I think this should probably be merged_mask = np.where(merged_mask > 0, 255, 0)

Thanks for your comment! ah yes, I think it should be merged_mask = np.where(merged_mask > 0, 255, 0) as you said. I have made the changes.

blaz-r · 2024-01-14T15:02:34Z

Nice work 😃, thanks for this addition. I didn't thoroughly check all the code, but I did comment on the question you had.

fengchuibeixiang · 2024-01-16T02:21:53Z

Hi, a good work! Are you using the "V1" branch of this project? I cannot find the path "src/anomalib/data/image" in the main branch. Should I switch to the "V1" branch to use your change? Looking forward to your reply. Thank you!

samet-akcay · 2024-01-16T05:30:02Z

Hi, a good work! Are you using the "V1" branch of this project? I cannot find the path "src/anomalib/data/image" in the main branch. Should I switch to the "V1" branch to use your change? Looking forward to your reply. Thank you!

@fengchuibeixiang , v1 will be released on the 31 of January

phcarval · 2024-01-17T11:43:11Z

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

willyfh · 2024-01-17T12:20:13Z

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

@phcarval thanks for your comment! Yes, I binarized the multi ground truth maps into one and stored it in a separate folder, without removing the original unmerged ground truth maps. This is necessary for the existing metrics. With and without merged_mask = np.minimum(merged_mask, 255), the evaluation existing metrics give different results.

But as you said, for sPRO metric, we need to consider to have a case where the validation/test set also can support the unmerged ones. I was thinking to an add additional column, something like multi_mask_path or unmerged_mask_path, in the returned samples, see here: https://github.com/openvinotoolkit/anomalib/pull/1635/files#diff-7e31cb96286793cb547505586ea1e87b6825cf6111db0b686bf276a732769300R138

The evaluation of the sPRO metrics can utilize the additional column to access the unmerged maps. It seems i may also need to modify the __getitem__ or override the method. I was planning to add the additional column when I work on the sPRO metric later. But maybe it will be better to address it in this PR.

P.S.: I haven't fully checked how the metrics calculation flow in anomalib, so my idea on using the additional column can be wrong :D

phcarval · 2024-01-17T13:23:52Z

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

@phcarval thanks for your comment! Yes, I binarized the multi ground truth maps into one and stored it in a separate folder, without removing the original unmerged ground truth maps. This is necessary for the existing metrics. With and without merged_mask = np.minimum(merged_mask, 255), the evaluation existing metrics give different results.

But as you said, for sPRO metric, we need to consider to have a case where the validation/test set also can support the unmerged ones. I was thinking to an add additional column, something like multi_mask_path or unmerged_mask_path, in the returned samples, see here: https://github.com/openvinotoolkit/anomalib/pull/1635/files#diff-7e31cb96286793cb547505586ea1e87b6825cf6111db0b686bf276a732769300R138

I think it's a good idea. I think it's pretty close to what @jpcbertoldo did in his PR #553.

The evaluation of the sPRO metrics can utilize the additional column to access the unmerged maps. It seems i may also need to modify the __getitem__ or override the method. I was planning to add the additional column when I work on the sPRO metric later. But maybe it will be better to address it in this PR.

P.S.: I haven't fully checked how the metrics calculation flow in anomalib, so my idea on using the additional column can be wrong :D

I have already thought about adding LOCO myself, but I haven't had time to do it for the moment. I think to properly implement the dataset the following points must be addressed:

Add a column with the "true values" of the anomaly maps (perhaps a simple sum?)
Add a dict in the MVTecLOCO class that is loaded using the .yaml files of LOCO to get the thresholds corresponding to each defect category (i.e. the pixel values in the anomaly maps)
Add an sPRO metric which will load the "true values" with their corresponding thresholds.

I think metrics are loaded somewhere in metrics_configuration.py or metrics/__init__.py. Right now I don't know exactly how a metric can retrieve the data that is loaded in MVTecLOCO though :/

Thanks anyway for the PR!

blaz-r · 2024-01-17T14:19:24Z

Yeah as @phcarval said, there are quite some things to consider. I think that we'd need to think through how to best support logical anomaly datasets as well as supervised datasets in the long run.

willyfh · 2024-01-17T14:36:45Z

@blaz-r @phcarval Thanks for your comments! I will try to address those asap :D

samet-akcay · 2024-01-17T14:54:38Z

@jpcbertoldo, any comments from pimo perspective?

jpcbertoldo · 2024-01-17T16:01:21Z

Disclaimer: I read the paper in-depth long ago and just re-skimmed it, so i might miss some details.

If I understood correctly, the GT merging is done for the sake of simplifying the use of other metrics. However, (pixel-level) AUROC doesnt make sense for logical anomalies.

An example, to make sure we're on the same page: in the image below, each square is supposed to have one pushpin but they all have 2 pushpins.

pushpins/test/logical_anomalies/018.png

masks 001.png and 003.png

So this image has 15 sPRO curves. Some of them (e.g. like above) have disconnected regions, but they're kind of considered as a single region for the metric (not in the morphological sense) because chosing one pushpin or another is the same ("Additional objects" in section "3.4 Selection of Saturation Thresholds" in the paper talks about this case).

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

@jpcbertoldo, any comments from pimo perspective?

As for PIMO, good news : )

Since the annotations are already split in different files, an analogous sPIMO curve could be defined as (for instance) the average (on the y-axis) of the sPRO curves coming from the same image.

phcarval · 2024-01-17T16:45:36Z

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

jpcbertoldo · 2024-01-17T20:16:37Z

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

I vaguely recall doing that indeed, but looking back it doesn't feel like the best idea. Again, i may be missing another detail that made me do that at the time.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated.

That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

phcarval · 2024-01-18T06:07:41Z

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

I vaguely recall doing that indeed, but looking back it doesn't feel like the best idea. Again, i may be missing another detail that made me do that at the time.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated.

That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

I think duplicating the source images means that you risk skewing the image-level metrics though.

willyfh · 2024-01-18T10:42:59Z

@jpcbertoldo Thank you so much for your comments! I've just started working on anomaly detection and anomalib recently, so please understand if my response or concern below maybe wrong and sounds too naive 😅 I think some things need to be clarified.

If I understood correctly, the GT merging is done for the sake of simplifying the use of other metrics. However, (pixel-level) AUROC doesn't make sense for logical anomalies.

Initially, I thought merging the masks would be helpful if we wanted to use this dataset with existing metrics. I believed this approach is simple and doesn't require invasive changes that might break many parts of the existing code, which expects a single mask path/map (e.g., get item method, metric calculation, visualization, etc.).

I wonder, is there any case where an Anomalib user wants to evaluate the MVTec LOCO dataset using existing metric behaviors (including using connected regions morphologically), even if it isn't entirely suitable? Perhaps just for the sake of experimentation, research, or simple curiosity?

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

Yes, I think, as @phcarval suggested, which seems similar to your PR on #553, using the sum operation would help encode the information into a single image, avoiding duplicating rows in the samples. This "merging" may also help "connect" the multiple masks within a single row of samples, which I believe is necessary for batch/dataset operations, as I will explain further below.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated. That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

I think the problem with using a repeated row for the same image is that it will "disconnect" the masks in the batching operation. Since each row is treated as a single instance, the same images with different masks can be separated into different batches. However, don't we need all masks to correspond to each image for one step of metric calculation or visualization of maps? Perhaps image-level metrics, as mentioned by @phcarval, could be relevant. I could be wrong since I am not fully familiar with the metrics calculation yet. I think the issue of "disconnected" masks could occur even if the samples are not shuffled for each step.

jpcbertoldo · 2024-01-18T16:49:10Z

I think duplicating the source images means that you risk skewing the image-level metrics though.

Hmmm... Yeah, you're right. LOCO is complicated indeed haha.

I wonder, is there any case where an Anomalib user wants to evaluate the MVTec LOCO dataset using existing metric behaviors (including using connected regions morphologically), even if it isn't entirely suitable? Perhaps just for the sake of experimentation, research, or simple curiosity?

Maybe for debug. I don't see it otherwise because LOCO's annotations are quite case-specific.

Yes, I think, as @phcarval suggested, which seems similar to your PR on #553, using the sum operation would help encode the information into a single image, avoiding duplicating rows in the samples.

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

For instance, the example in #1635 (comment) (say there are only those two masks) would be something like

merged = 1 * mask_001 + 2 * mask_003

(0=black, 1=white, 2=green)

so that you are able to recover the individual masks from merged. Otherwise you'd get

(0=black, 1=white)

where you cannot distinguish anymore which regions are from the same mask, so sPRO would be wrong.

Not sure we're on the same page but I think it's important to address this.

phcarval · 2024-01-18T17:41:53Z

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

That's a very good point I hadn't thought about! Not only do you need to encode for the different defect categories, but you also have to differentiate different masks that have the same value but are loaded from different files. And you somehow need to make it work through Torchmetrics' trainer. LOCO is complicated indeed.

jpcbertoldo · 2024-01-18T21:05:57Z

... but you also have to differentiate different masks that have the same value but are loaded from different files.

I'm not sure I understand when that would happen. The one-file-one-integer encoding I proposed was specifically to address this issue.

phcarval · 2024-01-18T21:17:01Z

No I agree with you here. Just making the point that you must differentiate not only the different defect types but also the different map files.

willyfh · 2024-01-18T22:47:52Z

Maybe for debug. I don't see it otherwise because LOCO's annotations are quite case-specific.

I see, i will consider this in the implementation, maybe by giving a warning when someone trying to use the metrics on the existing metrics implementation (in morphological way), or just forcing it via assert by not allowing them at all.

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

Ah, now I understand what you meant here. Based on your code, since the masks were binarized after being read, simply performing a sum would result in everything becoming 0 or 1. That's why the _sum_masks is necessary to encode the values with different integers. I got it, I will also address this in the implementation.

... but you also have to differentiate different masks that have the same value but are loaded from different files.

Oh, because after the encoding the masks to the integers, we lost the information of the original pixel value if there is no mapping to the original pixel value (?). If let say in one image has two different masks with pixel value of 241 and 253, and another image has also two different masks, 231 and 251, the masks for the different images will be encoded into 1 and 2 (?). So connecting to the defect config later would be difficult. Oh no, maybe it is handled somewhere, but I'm not sure, haha. I will also address it~. A sample of config:

{
        "defect_name": "2_additional_pushpins",
        "pixel_value": 254,
        "saturation_threshold": 12600,
        "relative_saturation": false
    },
    {
        "defect_name": "missing_pushpin",
        "pixel_value": 253,
        "saturation_threshold": 6300,
        "relative_saturation": false
    },

review-notebook-app · 2024-01-21T16:03:44Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…'masks' item

…g from the dataset

…cCollection

willyfh · 2024-01-21T17:49:58Z

Thank you for your comments! @jpcbertoldo @phcarval ,

Based on your comments, I have made the following changes:

Used the sum operation to merge the masks based on the original pixel values.
Overridden __getitem() to return the additional masks and to handle the binarization of the mask.
Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.
Passed the masks to the metric via pixel_metric.update on the validation batch end by adding an additional (keyword) argument.
Added a warning if the dataset is evaluated using the existing pixel-level metric implementation, where the 'region' is interpreted in a morphological sense.
Added the sPRO metric implementation.

Remaining things to do:

Verify the correctness of the spro_score() function.
Add unit tests.
Update documentation.
Consider adding AU-sPRO (?).

phcarval · 2024-01-22T06:55:22Z

Great job, thanks! Now I haven't tested it all but AUsPRO should definitely be added (as the AU- metrics are widely used in the library, AUPRO, AUROC...). I don't think it's very difficult if you've added sPRO, I believe there is an auc class somewhere in torchmetrics that will do the computation by itself.

jpcbertoldo · 2024-01-22T13:53:09Z

Great @willyfh !

Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.

I just didn't quite get this one.

The saturation parameter of sPRO is passed to the metric object on the setup? Can you point out to that part in the code plz?

willyfh · 2024-01-23T05:15:35Z

I don't think it's very difficult if you've added sPRO, I believe there is an auc class somewhere in torchmetrics that will do the computation by itself.

@phcarval I checked the implementation of aupro.py, seems it is quite complicated. the approach doesn't seem similar with pro.py. But ya, i need to understand the implementation first.

Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.

I just didn't quite get this one.

The saturation parameter of sPRO is passed to the metric object on the setup? Can you point out to that part in the code plz?

@jpcbertoldo The saturation_config which is loaded in the dataset is passed to the metric in this part:
https://github.com/openvinotoolkit/anomalib/blob/e0678c00dd8713c4cdae450696d352475a5d4a52/src/anomalib/callbacks/metrics.py#L93-L95

where finally this method will be executed, to update the metric which having the saturation_config attribute:

https://github.com/openvinotoolkit/anomalib/blob/e0678c00dd8713c4cdae450696d352475a5d4a52/src/anomalib/metrics/collection.py#L30-L40

Please let me know if you have any concerns.

willyfh · 2024-01-26T00:27:57Z

Hi @samet-akcay , should i rebase these changes to main branch? Or do i need to wait until v.1.1 branch is created later?

samet-akcay · 2024-01-26T06:02:17Z

Hi @willyfh, sorry merging v1 to main closed all the PRs that were targeting v1 :(

If you could create the PR targeting main, we could still merge it I think

willyfh changed the title ~~[WIP] Add support for MVTec LOCO dataset~~ v1 - [Feature] Add support for MVTec LOCO dataset Jan 13, 2024

willyfh commented Jan 13, 2024

View reviewed changes

github-actions bot added Tests Docs labels Jan 13, 2024

willyfh marked this pull request as ready for review January 13, 2024 13:06

willyfh requested review from ashwinvaidya17, djdameln and samet-akcay as code owners January 13, 2024 13:06

samet-akcay changed the title ~~v1 - [Feature] Add support for MVTec LOCO dataset~~ 🚀 v1 - Add support for MVTec LOCO dataset Jan 13, 2024

samet-akcay added the Data label Jan 13, 2024

github-actions bot removed the Data label Jan 14, 2024

willyfh added 6 commits January 20, 2024 11:50

add dummy image generation for mvtec loco dataset

bff9378

add unit test for mvtec loco dataset

7025e55

update changelog to include the addition of mvtec loco dataset

514efbc

add mvtec loco dataset to the sphinx-based docs

f68277c

fix the malformed table

16f215e

binarize the masks and avoid the possibility of the merge_mask is None

fc35ad8

willyfh requested a review from paularamo as a code owner January 21, 2024 16:03

github-actions bot added the Notebooks label Jan 21, 2024

willyfh added 8 commits January 22, 2024 02:21

Merge the masks using sum operation without binarization

d192c7c

override getitem method to handle binarization and to add additional …

9b466d4

…'masks' item

Add saturation config to the datamodule

7f07bbe

Update the saturation config on the metrics based on the loaded confi…

7ac7308

…g from the dataset

add masks as a keyword args to the update method of the AnomalibMetri…

ca5f1a5

…cCollection

Shorten the comments to solve ruff issues

c02c256

Add sPro metric implementation

0865765

Change the saturation threshold to tensor

ccb7599

willyfh force-pushed the mvtec-loco-dataset branch from 8846fd4 to ccb7599 Compare January 21, 2024 17:23

github-actions bot removed the Notebooks label Jan 21, 2024

willyfh changed the title ~~🚀 v1 - Add support for MVTec LOCO dataset~~ 🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric Jan 21, 2024

Handle case with only background/normal images in scoring

e0678c0

samet-akcay deleted the branch open-edge-platform:v1 January 24, 2024 12:19

samet-akcay closed this Jan 24, 2024

willyfh mentioned this pull request Jan 27, 2024

🚀 Add support for MVTec LOCO dataset and sPRO metric #1686

Merged

9 tasks

🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric #1635

🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric #1635

Uh oh!

Conversation

willyfh commented Jan 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

✨ Changes

✅ Checklist

Uh oh!

willyfh Jan 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaz-r Jan 14, 2024

Choose a reason for hiding this comment

Uh oh!

willyfh Jan 14, 2024

Choose a reason for hiding this comment

Uh oh!

blaz-r Jan 14, 2024

Choose a reason for hiding this comment

Uh oh!

blaz-r commented Jan 14, 2024

Uh oh!

fengchuibeixiang commented Jan 16, 2024

Uh oh!

samet-akcay commented Jan 16, 2024

Uh oh!

phcarval commented Jan 17, 2024

Uh oh!

willyfh commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phcarval commented Jan 17, 2024

Uh oh!

blaz-r commented Jan 17, 2024

Uh oh!

willyfh commented Jan 17, 2024

Uh oh!

samet-akcay commented Jan 17, 2024

Uh oh!

jpcbertoldo commented Jan 17, 2024

Uh oh!

phcarval commented Jan 17, 2024

Uh oh!

jpcbertoldo commented Jan 17, 2024

Uh oh!

phcarval commented Jan 18, 2024

Uh oh!

willyfh commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpcbertoldo commented Jan 18, 2024

Uh oh!

phcarval commented Jan 18, 2024

Uh oh!

jpcbertoldo commented Jan 18, 2024

Uh oh!

phcarval commented Jan 18, 2024

Uh oh!

willyfh commented Jan 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Jan 21, 2024

Uh oh!

willyfh commented Jan 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phcarval commented Jan 22, 2024

Uh oh!

jpcbertoldo commented Jan 22, 2024

Uh oh!

willyfh commented Jan 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

willyfh commented Jan 26, 2024

Uh oh!

samet-akcay commented Jan 26, 2024

Uh oh!

willyfh commented Jan 13, 2024 •

edited

Loading

willyfh Jan 13, 2024 •

edited

Loading

willyfh commented Jan 17, 2024 •

edited

Loading

willyfh commented Jan 18, 2024 •

edited

Loading

willyfh commented Jan 18, 2024 •

edited

Loading

willyfh commented Jan 21, 2024 •

edited

Loading

willyfh commented Jan 23, 2024 •

edited

Loading