Skip to content

Conversation

@willyfh
Copy link
Contributor

@willyfh willyfh commented Jan 13, 2024

📝 Description

Add the implementation of MVTec LOCO AD dataset [Paper]

The dataset structure could contains multiple ground-truth masks for the corresponding anomalous images. The dataset _setup method will merge the multiple masks and store the merged ground-truth masks into a separate ground-truth directory via the _merge_gt_mask function.

Note: I used the downloadable dataset link found from this closed PR: #553

This PR does not include the implementation of sPRO metrics, which seems the metric is designed to additionally evaluate MVTec LOCO dataset. I plan to implement the metric in a separate PR later.

🛠️ Fixes #574 #471 #1341

✨ Changes

Select what type of change your PR is:

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🔨 Refactor (non-breaking change which refactors the code base)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

  • 📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
  • 📚 I have made the necessary updates to the documentation (if applicable).
  • 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

@willyfh willyfh changed the title [WIP] Add support for MVTec LOCO dataset v1 - [Feature] Add support for MVTec LOCO dataset Jan 13, 2024
merged_mask = np.maximum(merged_mask, mask)

# Binarize masks
merged_mask = np.minimum(merged_mask, 255)
Copy link
Contributor Author

@willyfh willyfh Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if i need to binarize the masks here or not. The ground-truth masks of the dataset is can vary, e.g., 241, 253, etc, depending on the sub-categories of the defects. I tried to evaluate the dataset using the existing metrics, seems it doesn't effect the evaluation results (with or without this binarization). Please correct me if I'm wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting observation. I think that is a good idea to only have 255 values in any case, since the masks are divided by 255 here:
https://github.com/openvinotoolkit/anomalib/blob/69184e378ea025d1f23ad86ebac1ada53debbe25/src/anomalib/data/base/dataset.py#L130

But I think that this line doesn't exactly do that? Isn't this element wise minimum, for example in case of 245, going to set output to 245, not 255.
I think this should probably be merged_mask = np.where(merged_mask > 0, 255, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment! ah yes, I think it should be merged_mask = np.where(merged_mask > 0, 255, 0) as you said. I have made the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

@willyfh willyfh marked this pull request as ready for review January 13, 2024 13:06
@samet-akcay samet-akcay changed the title v1 - [Feature] Add support for MVTec LOCO dataset 🚀 v1 - Add support for MVTec LOCO dataset Jan 13, 2024
@github-actions github-actions bot removed the Data label Jan 14, 2024
@blaz-r
Copy link
Contributor

blaz-r commented Jan 14, 2024

Nice work 😃, thanks for this addition. I didn't thoroughly check all the code, but I did comment on the question you had.

@fengchuibeixiang
Copy link

Hi, a good work! Are you using the "V1" branch of this project? I cannot find the path "src/anomalib/data/image" in the main branch. Should I switch to the "V1" branch to use your change? Looking forward to your reply. Thank you!

@samet-akcay
Copy link
Contributor

Hi, a good work! Are you using the "V1" branch of this project? I cannot find the path "src/anomalib/data/image" in the main branch. Should I switch to the "V1" branch to use your change? Looking forward to your reply. Thank you!

@fengchuibeixiang , v1 will be released on the 31 of January

@phcarval
Copy link
Contributor

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

@willyfh
Copy link
Contributor Author

willyfh commented Jan 17, 2024

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

@phcarval thanks for your comment! Yes, I binarized the multi ground truth maps into one and stored it in a separate folder, without removing the original unmerged ground truth maps. This is necessary for the existing metrics. With and without merged_mask = np.minimum(merged_mask, 255), the evaluation existing metrics give different results.

But as you said, for sPRO metric, we need to consider to have a case where the validation/test set also can support the unmerged ones. I was thinking to an add additional column, something like multi_mask_path or unmerged_mask_path, in the returned samples, see here: https://github.com/openvinotoolkit/anomalib/pull/1635/files#diff-7e31cb96286793cb547505586ea1e87b6825cf6111db0b686bf276a732769300R138

The evaluation of the sPRO metrics can utilize the additional column to access the unmerged maps. It seems i may also need to modify the __getitem__ or override the method. I was planning to add the additional column when I work on the sPRO metric later. But maybe it will be better to address it in this PR.

P.S.: I haven't fully checked how the metrics calculation flow in anomalib, so my idea on using the additional column can be wrong :D

@phcarval
Copy link
Contributor

Hi, thanks for the work. If I understand correctly, you merge all existing ground truth maps into one and binarize the result. Will it not make it difficult to work with the sPRO metric, which requires the different ground truth maps and their corresponding threshold?

@phcarval thanks for your comment! Yes, I binarized the multi ground truth maps into one and stored it in a separate folder, without removing the original unmerged ground truth maps. This is necessary for the existing metrics. With and without merged_mask = np.minimum(merged_mask, 255), the evaluation existing metrics give different results.

But as you said, for sPRO metric, we need to consider to have a case where the validation/test set also can support the unmerged ones. I was thinking to an add additional column, something like multi_mask_path or unmerged_mask_path, in the returned samples, see here: https://github.com/openvinotoolkit/anomalib/pull/1635/files#diff-7e31cb96286793cb547505586ea1e87b6825cf6111db0b686bf276a732769300R138

I think it's a good idea. I think it's pretty close to what @jpcbertoldo did in his PR #553.

The evaluation of the sPRO metrics can utilize the additional column to access the unmerged maps. It seems i may also need to modify the __getitem__ or override the method. I was planning to add the additional column when I work on the sPRO metric later. But maybe it will be better to address it in this PR.

P.S.: I haven't fully checked how the metrics calculation flow in anomalib, so my idea on using the additional column can be wrong :D

I have already thought about adding LOCO myself, but I haven't had time to do it for the moment. I think to properly implement the dataset the following points must be addressed:

  • Add a column with the "true values" of the anomaly maps (perhaps a simple sum?)
  • Add a dict in the MVTecLOCO class that is loaded using the .yaml files of LOCO to get the thresholds corresponding to each defect category (i.e. the pixel values in the anomaly maps)
  • Add an sPRO metric which will load the "true values" with their corresponding thresholds.

I think metrics are loaded somewhere in metrics_configuration.py or metrics/__init__.py. Right now I don't know exactly how a metric can retrieve the data that is loaded in MVTecLOCO though :/

Thanks anyway for the PR!

@blaz-r
Copy link
Contributor

blaz-r commented Jan 17, 2024

Yeah as @phcarval said, there are quite some things to consider. I think that we'd need to think through how to best support logical anomaly datasets as well as supervised datasets in the long run.

@willyfh
Copy link
Contributor Author

willyfh commented Jan 17, 2024

@blaz-r @phcarval Thanks for your comments! I will try to address those asap :D

@samet-akcay
Copy link
Contributor

@jpcbertoldo, any comments from pimo perspective?

@jpcbertoldo
Copy link
Contributor

Disclaimer: I read the paper in-depth long ago and just re-skimmed it, so i might miss some details.

If I understood correctly, the GT merging is done for the sake of simplifying the use of other metrics. However, (pixel-level) AUROC doesnt make sense for logical anomalies.

An example, to make sure we're on the same page: in the image below, each square is supposed to have one pushpin but they all have 2 pushpins.

pushpins/test/logical_anomalies/018.png

018

masks 001.png and 003.png

image

So this image has 15 sPRO curves. Some of them (e.g. like above) have disconnected regions, but they're kind of considered as a single region for the metric (not in the morphological sense) because chosing one pushpin or another is the same ("Additional objects" in section "3.4 Selection of Saturation Thresholds" in the paper talks about this case).

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

@jpcbertoldo, any comments from pimo perspective?

As for PIMO, good news : )

Since the annotations are already split in different files, an analogous sPIMO curve could be defined as (for instance) the average (on the y-axis) of the sPRO curves coming from the same image.

@phcarval
Copy link
Contributor

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

@jpcbertoldo
Copy link
Contributor

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

I vaguely recall doing that indeed, but looking back it doesn't feel like the best idea. Again, i may be missing another detail that made me do that at the time.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated.

That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

@phcarval
Copy link
Contributor

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

The difficulty comes from the fact that you can only associate one mask to one image. Here we have 15 masks, so in order for it to work properly, we're forced to merge them together. But I thought that's what you also did on your original PR, since I see that you use both "mask" and "masks" key for the dataset. I assume you merged the maps for one and collected them in a list for the other.

I vaguely recall doing that indeed, but looking back it doesn't feel like the best idea. Again, i may be missing another detail that made me do that at the time.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated.

That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

I think duplicating the source images means that you risk skewing the image-level metrics though.

@willyfh
Copy link
Contributor Author

willyfh commented Jan 18, 2024

@jpcbertoldo Thank you so much for your comments! I've just started working on anomaly detection and anomalib recently, so please understand if my response or concern below maybe wrong and sounds too naive 😅 I think some things need to be clarified.

If I understood correctly, the GT merging is done for the sake of simplifying the use of other metrics. However, (pixel-level) AUROC doesn't make sense for logical anomalies.

Initially, I thought merging the masks would be helpful if we wanted to use this dataset with existing metrics. I believed this approach is simple and doesn't require invasive changes that might break many parts of the existing code, which expects a single mask path/map (e.g., get item method, metric calculation, visualization, etc.).

I wonder, is there any case where an Anomalib user wants to evaluate the MVTec LOCO dataset using existing metric behaviors (including using connected regions morphologically), even if it isn't entirely suitable? Perhaps just for the sake of experimentation, research, or simple curiosity?

So if you merge all 15 annotations you still need to encode the information of where each connected region comes from. Right? If that is correct, is the merging actually useful?

Yes, I think, as @phcarval suggested, which seems similar to your PR on #553, using the sum operation would help encode the information into a single image, avoiding duplicating rows in the samples. This "merging" may also help "connect" the multiple masks within a single row of samples, which I believe is necessary for batch/dataset operations, as I will explain further below.

Maybe it could be managed the other way around? Instead of forcing the masks to become a single one, make the single source image be repeated. That internal dataframe you could make each row point to a specific mask (as it is in the file structure) and you would see several rows with the same input image. Kind of suboptimal computationally but i think there a not sooo many cases with multiple masks.

I think the problem with using a repeated row for the same image is that it will "disconnect" the masks in the batching operation. Since each row is treated as a single instance, the same images with different masks can be separated into different batches. However, don't we need all masks to correspond to each image for one step of metric calculation or visualization of maps? Perhaps image-level metrics, as mentioned by @phcarval, could be relevant. I could be wrong since I am not fully familiar with the metrics calculation yet. I think the issue of "disconnected" masks could occur even if the samples are not shuffled for each step.

@jpcbertoldo
Copy link
Contributor

I think duplicating the source images means that you risk skewing the image-level metrics though.

Hmmm... Yeah, you're right. LOCO is complicated indeed haha.

I wonder, is there any case where an Anomalib user wants to evaluate the MVTec LOCO dataset using existing metric behaviors (including using connected regions morphologically), even if it isn't entirely suitable? Perhaps just for the sake of experimentation, research, or simple curiosity?

Maybe for debug. I don't see it otherwise because LOCO's annotations are quite case-specific.

Yes, I think, as @phcarval suggested, which seems similar to your PR on #553, using the sum operation would help encode the information into a single image, avoiding duplicating rows in the samples.

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

For instance, the example in #1635 (comment) (say there are only those two masks) would be something like

merged = 1 * mask_001 + 2 * mask_003

(0=black, 1=white, 2=green)
image

so that you are able to recover the individual masks from merged. Otherwise you'd get

(0=black, 1=white)
image

where you cannot distinguish anymore which regions are from the same mask, so sPRO would be wrong.

Not sure we're on the same page but I think it's important to address this.

@phcarval
Copy link
Contributor

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

That's a very good point I hadn't thought about! Not only do you need to encode for the different defect categories, but you also have to differentiate different masks that have the same value but are loaded from different files. And you somehow need to make it work through Torchmetrics' trainer. LOCO is complicated indeed.

@jpcbertoldo
Copy link
Contributor

... but you also have to differentiate different masks that have the same value but are loaded from different files.

I'm not sure I understand when that would happen. The one-file-one-integer encoding I proposed was specifically to address this issue.

@phcarval
Copy link
Contributor

No I agree with you here. Just making the point that you must differentiate not only the different defect types but also the different map files.

@willyfh
Copy link
Contributor Author

willyfh commented Jan 18, 2024

Maybe for debug. I don't see it otherwise because LOCO's annotations are quite case-specific.

I see, i will consider this in the implementation, maybe by giving a warning when someone trying to use the metrics on the existing metrics implementation (in morphological way), or just forcing it via assert by not allowing them at all.

I don't recall exactly what I did, but if you do the sum indeed then I believe you should need to encode different sub-masks (?) with different integers.

Ah, now I understand what you meant here. Based on your code, since the masks were binarized after being read, simply performing a sum would result in everything becoming 0 or 1. That's why the _sum_masks is necessary to encode the values with different integers. I got it, I will also address this in the implementation.

image

... but you also have to differentiate different masks that have the same value but are loaded from different files.

Oh, because after the encoding the masks to the integers, we lost the information of the original pixel value if there is no mapping to the original pixel value (?). If let say in one image has two different masks with pixel value of 241 and 253, and another image has also two different masks, 231 and 251, the masks for the different images will be encoded into 1 and 2 (?). So connecting to the defect config later would be difficult. Oh no, maybe it is handled somewhere, but I'm not sure, haha. I will also address it~. A sample of config:

{
        "defect_name": "2_additional_pushpins",
        "pixel_value": 254,
        "saturation_threshold": 12600,
        "relative_saturation": false
    },
    {
        "defect_name": "missing_pushpin",
        "pixel_value": 253,
        "saturation_threshold": 6300,
        "relative_saturation": false
    },

@willyfh willyfh requested a review from paularamo as a code owner January 21, 2024 16:03
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@willyfh
Copy link
Contributor Author

willyfh commented Jan 21, 2024

Thank you for your comments! @jpcbertoldo @phcarval ,

Based on your comments, I have made the following changes:

  • Used the sum operation to merge the masks based on the original pixel values.
  • Overridden __getitem() to return the additional masks and to handle the binarization of the mask.
  • Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.
  • Passed the masks to the metric via pixel_metric.update on the validation batch end by adding an additional (keyword) argument.
  • Added a warning if the dataset is evaluated using the existing pixel-level metric implementation, where the 'region' is interpreted in a morphological sense.
  • Added the sPRO metric implementation.

Remaining things to do:

  • Verify the correctness of the spro_score() function.
  • Add unit tests.
  • Update documentation.
  • Consider adding AU-sPRO (?).

@willyfh willyfh changed the title 🚀 v1 - Add support for MVTec LOCO dataset 🚀 v1 - Add support for MVTec LOCO dataset and sPRO metric Jan 21, 2024
@phcarval
Copy link
Contributor

Great job, thanks! Now I haven't tested it all but AUsPRO should definitely be added (as the AU- metrics are widely used in the library, AUPRO, AUROC...). I don't think it's very difficult if you've added sPRO, I believe there is an auc class somewhere in torchmetrics that will do the computation by itself.

@jpcbertoldo
Copy link
Contributor

Great @willyfh !

  • Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.

I just didn't quite get this one.

The saturation parameter of sPRO is passed to the metric object on the setup? Can you point out to that part in the code plz?

@willyfh
Copy link
Contributor Author

willyfh commented Jan 23, 2024

I don't think it's very difficult if you've added sPRO, I believe there is an auc class somewhere in torchmetrics that will do the computation by itself.

@phcarval I checked the implementation of aupro.py, seems it is quite complicated. the approach doesn't seem similar with pro.py. But ya, i need to understand the implementation first.

  • Loaded the saturation config in the datamodule and passed the config to the metric via the setup() of _MetricsCallback.

I just didn't quite get this one.

The saturation parameter of sPRO is passed to the metric object on the setup? Can you point out to that part in the code plz?

@jpcbertoldo The saturation_config which is loaded in the dataset is passed to the metric in this part:
https://github.com/openvinotoolkit/anomalib/blob/e0678c00dd8713c4cdae450696d352475a5d4a52/src/anomalib/callbacks/metrics.py#L93-L95

where finally this method will be executed, to update the metric which having the saturation_config attribute:

https://github.com/openvinotoolkit/anomalib/blob/e0678c00dd8713c4cdae450696d352475a5d4a52/src/anomalib/metrics/collection.py#L30-L40

Please let me know if you have any concerns.

@samet-akcay samet-akcay deleted the branch open-edge-platform:v1 January 24, 2024 12:19
@willyfh
Copy link
Contributor Author

willyfh commented Jan 26, 2024

Hi @samet-akcay , should i rebase these changes to main branch? Or do i need to wait until v.1.1 branch is created later?

@samet-akcay
Copy link
Contributor

Hi @willyfh, sorry merging v1 to main closed all the PRs that were targeting v1 :(

If you could create the PR targeting main, we could still merge it I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants