Skip to content

Use real weight and image for classification model test and relaxing precision requirement for general model tests #7130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

YosuaMichael
Copy link
Contributor

@YosuaMichael YosuaMichael commented Jan 25, 2023

From investigation (see #7114 (comment)), it seems that our model test is sensitive to machine type. And after testing on AWS cluster, it seems to start failing since the PR #6380 in AWS cluster machine. To fix this, we should revert the PR and relax the precision criteria.

Update:
In order to make the model test green, we do the following in this PR:

  • Use real weight and real image for the test_classification_model.
  • Relaxing the precision requirement for passing the test when comparing with expected file
  • Relaxing the precision requirement for passing the fx check (before it was 1e-5 and vit_h failed with 1.2e-5 in AWS cluster, so I update it to 5e-5)
  • Use float64 for flaky detection model (I think this is a good approach as we can still do the test and detect if something wrong other than precision problem happened. We might consider to use this approach on flaky test other than detection model)

cc @pmeier

@pmeier
Copy link
Collaborator

pmeier commented Jan 25, 2023

Would it make sense to go with a middle ground here? I see we had 1e-3 and are reverting to 1e-1. I agree with the premise of #6380 that 1e-1 is pretty loose tolerance. Maybe we can use 1e-2 or the like? If the internal workflow, GHA, and the AWS cluster is happy, we have a little more confidence in the the CI signal without hopefully getting flakiness.

@YosuaMichael
Copy link
Contributor Author

YosuaMichael commented Jan 26, 2023

Would it make sense to go with a middle ground here? I see we had 1e-3 and are reverting to 1e-1. I agree with the premise of #6380 that 1e-1 is pretty loose tolerance. Maybe we can use 1e-2 or the like? If the internal workflow, GHA, and the AWS cluster is happy, we have a little more confidence in the the CI signal without hopefully getting flakiness.

Looking at the errors, seems like 1e-2 will still have a lot of model that failed. Also, even 0.1 still have resnet101 failed (which currently I am not too sure what to do on this).

@YosuaMichael YosuaMichael changed the title Relaxing test_models precision (Reverting #6380) Use real weight and image for classification model test and relaxing precision requirement for general model tests Jan 26, 2023
@YosuaMichael YosuaMichael marked this pull request as ready for review January 27, 2023 02:14
@YosuaMichael
Copy link
Contributor Author

YosuaMichael commented Jan 27, 2023

@NicolasHug @pmeier This PR resolve most problem on the model test. From what I see the remaining problem is on vit_h_14 which strangely got the following tracing

___________________ test_classification_model[cuda-vit_h_14] ___________________
Traceback (most recent call last):
  File "/work/test/test_models.py", line 732, in test_classification_model
    _check_input_backprop(model, x)
  File "/work/test/test_models.py", line 226, in _check_input_backprop
    out[0].sum().backward()
  File "/work/ci_env/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/work/ci_env/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this is a core issue and I create an issue for this (I can't reproduce this error on AWS cluster, and setting CUDA_LAUNCH_BLOCKING=1 didn't really give a better error trace as well).

Also note that the error on test in python 3.7 is not relevant, seems like pytorch core plan to deprecate python 3.7. (which should be fixed after #7110)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yosua, I left a few comments, LMK what you think

else:
H, W = input_shape[-2:]
min_side = min(H, W)
preprocess = weights.transforms(resize_size=min_side, crop_size=min_side)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to pass parameters to the weights.transforms() , they will handle the size properly.

Copy link
Contributor Author

@YosuaMichael YosuaMichael Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this if we want to control the size when the test happened, otherwise we will rely on the default size on the weight transforms (In some big model, we would like to use smaller image size for the test to speed up runtime).

Note: For test purpose, I think it is okay not to use the preferred image size that will yield the best accuracy for the model.

@@ -51,19 +51,26 @@ def _get_image(input_shape, real_image, device):

img = Image.open(GRACE_HOPPER)

original_width, original_height = img.size
if weights is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just pass the weights all the time? What's the reason for having them in only some cases but not all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases the weight are really restrictive, for instance if we use vit_h_14, it will only accept the image_size of the size of the min_size of the weight: https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py#L321 and in this case we can't do the test with lower resolution with the weight.

Also as of now, we dont use real weight for detection model test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, it will only accept the image_size of the size of the min_size of the weight: https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py#L321 and in this case we can't do the test with lower resolution with the weight

But isn't that a good thing? i.e. if we go below the min_size limit, wouldn't we expect the model to output garbage? And if not, why is the limit not lower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For test purpose, we might want to use smaller image even if the output is garbage but we can still check for the consistentcy (what we did so far with random image and random weight). And in this case if we set weight=None then it will basically behave like before, the get_image will assume that the test dont use real weight but rather initialized with random weight.

@@ -364,7 +376,8 @@ def _check_input_backprop(model, inputs):
"s3d": {
"input_shape": (1, 3, 16, 224, 224),
},
"googlenet": {"init_weights": True},
"regnet_y_128gf": {"weight_name": "IMAGENET1K_SWAG_LINEAR_V1"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just get the actual weights from the model name, using the helpers from https://pytorch.org/vision/main/models.html#model-registration-mechanism ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, I actually use the helper to get the actual weight in here.

I think I prefer this design where we dont need to specify the weight_enum for the weight_name (since it can be retrieve from the model_name). Also, it is easier to say that the default value that we use is IMAGENET1K_V1 for the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants