Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

YosuaMichael · 2023-01-25T14:36:37Z

From investigation (see #7114 (comment)), it seems that our model test is sensitive to machine type. And after testing on AWS cluster, it seems to start failing since the PR #6380 in AWS cluster machine. To fix this, we should revert the PR and relax the precision criteria.

Update:
In order to make the model test green, we do the following in this PR:

Use real weight and real image for the test_classification_model.
Relaxing the precision requirement for passing the test when comparing with expected file
Relaxing the precision requirement for passing the fx check (before it was 1e-5 and vit_h failed with 1.2e-5 in AWS cluster, so I update it to 5e-5)
Use float64 for flaky detection model (I think this is a good approach as we can still do the test and detect if something wrong other than precision problem happened. We might consider to use this approach on flaky test other than detection model)

cc @pmeier

pmeier · 2023-01-25T20:54:40Z

Would it make sense to go with a middle ground here? I see we had 1e-3 and are reverting to 1e-1. I agree with the premise of #6380 that 1e-1 is pretty loose tolerance. Maybe we can use 1e-2 or the like? If the internal workflow, GHA, and the AWS cluster is happy, we have a little more confidence in the the CI signal without hopefully getting flakiness.

YosuaMichael · 2023-01-26T12:18:57Z

Would it make sense to go with a middle ground here? I see we had 1e-3 and are reverting to 1e-1. I agree with the premise of #6380 that 1e-1 is pretty loose tolerance. Maybe we can use 1e-2 or the like? If the internal workflow, GHA, and the AWS cluster is happy, we have a little more confidence in the the CI signal without hopefully getting flakiness.

Looking at the errors, seems like 1e-2 will still have a lot of model that failed. Also, even 0.1 still have resnet101 failed (which currently I am not too sure what to do on this).

…ion into test/relaxing-precision

…e image_size as before

YosuaMichael · 2023-01-27T09:24:19Z

@NicolasHug @pmeier This PR resolve most problem on the model test. From what I see the remaining problem is on vit_h_14 which strangely got the following tracing

___________________ test_classification_model[cuda-vit_h_14] ___________________
Traceback (most recent call last):
  File "/work/test/test_models.py", line 732, in test_classification_model
    _check_input_backprop(model, x)
  File "/work/test/test_models.py", line 226, in _check_input_backprop
    out[0].sum().backward()
  File "/work/ci_env/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/work/ci_env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this is a core issue and I create an issue for this (I can't reproduce this error on AWS cluster, and setting CUDA_LAUNCH_BLOCKING=1 didn't really give a better error trace as well).

Also note that the error on test in python 3.7 is not relevant, seems like pytorch core plan to deprecate python 3.7. (which should be fixed after #7110)

NicolasHug

Thanks Yosua, I left a few comments, LMK what you think

NicolasHug · 2023-01-27T09:41:53Z

test/test_models.py

+        else:
+            H, W = input_shape[-2:]
+            min_side = min(H, W)
+            preprocess = weights.transforms(resize_size=min_side, crop_size=min_side)


we don't need to pass parameters to the weights.transforms() , they will handle the size properly.

We need this if we want to control the size when the test happened, otherwise we will rely on the default size on the weight transforms (In some big model, we would like to use smaller image size for the test to speed up runtime).

Note: For test purpose, I think it is okay not to use the preferred image size that will yield the best accuracy for the model.

NicolasHug · 2023-01-27T09:43:16Z

test/test_models.py

@@ -51,19 +51,26 @@ def _get_image(input_shape, real_image, device):

        img = Image.open(GRACE_HOPPER)

-        original_width, original_height = img.size
+        if weights is None:


should we just pass the weights all the time? What's the reason for having them in only some cases but not all?

In some cases the weight are really restrictive, for instance if we use vit_h_14, it will only accept the image_size of the size of the min_size of the weight: https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py#L321 and in this case we can't do the test with lower resolution with the weight.

Also as of now, we dont use real weight for detection model test.

, it will only accept the image_size of the size of the min_size of the weight: https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py#L321 and in this case we can't do the test with lower resolution with the weight

But isn't that a good thing? i.e. if we go below the min_size limit, wouldn't we expect the model to output garbage? And if not, why is the limit not lower?

For test purpose, we might want to use smaller image even if the output is garbage but we can still check for the consistentcy (what we did so far with random image and random weight). And in this case if we set weight=None then it will basically behave like before, the get_image will assume that the test dont use real weight but rather initialized with random weight.

test/test_models.py

NicolasHug · 2023-01-27T09:55:30Z

test/test_models.py

@@ -364,7 +376,8 @@ def _check_input_backprop(model, inputs):
    "s3d": {
        "input_shape": (1, 3, 16, 224, 224),
    },
-    "googlenet": {"init_weights": True},
+    "regnet_y_128gf": {"weight_name": "IMAGENET1K_SWAG_LINEAR_V1"},


Could we just get the actual weights from the model name, using the helpers from https://pytorch.org/vision/main/models.html#model-registration-mechanism ?

We can, I actually use the helper to get the actual weight in here.

I think I prefer this design where we dont need to specify the weight_enum for the weight_name (since it can be retrieve from the model_name). Also, it is easier to say that the default value that we use is IMAGENET1K_V1 for the test.

test/test_models.py

…pected to num_classes_to_check

…ion into test/relaxing-precision

Relaxing test_models precision, revert pytorch#6380

dcdc8db

YosuaMichael added enhancement module: tests labels Jan 25, 2023

YosuaMichael self-assigned this Jan 25, 2023

facebook-github-bot added the cla signed label Jan 25, 2023

Check test using cuda 11.7 instead

4e1cb65

YosuaMichael added 4 commits January 26, 2023 19:13

Use real weight and image for classification model and adjust precision

edcb727

Switch back to cuda 11.6

ff950d7

Relaxing fx test tolerance to 5e-5

c32b4ae

Relaxing detection test and use float64 for flaky detection models

0e9fc37

YosuaMichael changed the title ~~Relaxing test_models precision (Reverting #6380)~~ Use real weight and image for classification model test and relaxing precision requirement for general model tests Jan 26, 2023

YosuaMichael added 5 commits January 26, 2023 23:21

Merge branch 'main' into test/relaxing-precision

ef6e11c

Fix linter issue

e384ca0

Fix to use real image for classification model

8d33c56

Merge branch 'test/relaxing-precision' of github.com:YosuaMichael/vis…

a2ec9c1

…ion into test/relaxing-precision

Mark maskrcnn_resnet50_fpn_v2 as flaky detection model

d991bc4

YosuaMichael marked this pull request as ready for review January 27, 2023 02:14

YosuaMichael requested review from NicolasHug and pmeier January 27, 2023 02:15

Fix vitc test and try not using pretrained weight for vit_h and reduc…

7873c09

…e image_size as before

Merge branch 'main' into test/relaxing-precision

ad83ef6

NicolasHug reviewed Jan 27, 2023

View reviewed changes

YosuaMichael added 5 commits January 27, 2023 10:18

Remove _get_image comment that list the model usage and change num_ex…

ad99e28

…pected to num_classes_to_check

Merge branch 'test/relaxing-precision' of github.com:YosuaMichael/vis…

5047b49

…ion into test/relaxing-precision

Merge branch 'main' into test/relaxing-precision

7661ab4

Simplify slow model input_shape assignment

b6e83c1

Fix with ufmt format

60675be

Merge branch 'main' into test/relaxing-precision

ab81f0e

This was referenced Feb 6, 2023

Linux GPU unittests on CircleCI fail due to CUDA version mismatch #7184

Closed

Port macOS tests from CircleCI got GHA #7175

Closed

Merge branch 'main' into test/relaxing-precision

0b5bcec

NicolasHug mentioned this pull request Feb 9, 2023

Put back previous tolerance for test_classification and test_video #7202

Merged

pmeier mentioned this pull request Feb 9, 2023

Test some flaky detection models on float64 instead of float32 #7204

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

Uh oh!

YosuaMichael commented Jan 25, 2023 •

edited

Loading

Uh oh!

pmeier commented Jan 25, 2023

Uh oh!

YosuaMichael commented Jan 26, 2023 •

edited

Loading

Uh oh!

YosuaMichael commented Jan 27, 2023 •

edited

Loading

Uh oh!

NicolasHug left a comment

Uh oh!

NicolasHug Jan 27, 2023

Uh oh!

YosuaMichael Jan 27, 2023 •

edited

Loading

Uh oh!

NicolasHug Jan 27, 2023

Uh oh!

YosuaMichael Jan 27, 2023

Uh oh!

NicolasHug Jan 27, 2023

Uh oh!

YosuaMichael Jan 27, 2023

Uh oh!

Uh oh!

NicolasHug Jan 27, 2023

Uh oh!

YosuaMichael Jan 27, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

Are you sure you want to change the base?

Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

Uh oh!

Conversation

YosuaMichael commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Jan 25, 2023

Uh oh!

YosuaMichael commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YosuaMichael commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

YosuaMichael Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YosuaMichael commented Jan 25, 2023 •

edited

Loading

YosuaMichael commented Jan 26, 2023 •

edited

Loading

YosuaMichael commented Jan 27, 2023 •

edited

Loading

YosuaMichael Jan 27, 2023 •

edited

Loading