Skip to content

Embedding_bag mismatch #1056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
justinchuby opened this issue Sep 11, 2023 · 12 comments
Open

Embedding_bag mismatch #1056

justinchuby opened this issue Sep 11, 2023 · 12 comments
Assignees
Labels
bug Something isn't working contribution welcome We welcome code contributions for this module: torchlib Related to the torch/aten function lib in development

Comments

@justinchuby
Copy link
Collaborator

_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 3.4629,  4.7383, -5.2305, -7.2500,  3.7188],\n        [ 0.5361, -6.0039, -1.3975,  0.3867, -1.9512],\n        [ 8.1719,  0.7734, -8.7109, -3.3320, -6.9609],\n        [-2.2578, -8.6875,  6.7500,  4.0352, -6.4531],\n        [-7.2695, -1.1426,  8.3438,  2.2324,  0.7031],\n        [ 3.9199, -7.8047, -7.3398, -7.8672, -7.1641],\n        [ 3.2344, -6.4414,  3.3047,  3.0508, -7.5234],\n        [ 7.7695, -1.9336, -7.4531,  4.7188, -6.1602],\n        [ 7.6562, -4.5352, -1.9424,  1.0020,  7.0156],\n        [ 0.2549,  6.6172,  2.5391, -7.7188,  2.3984]], dtype=torch.float16), tensor([[9, 3, 8, 4, 4],\n        [9, 4, 4, 9, 9],\n        [5, 7, 0, 3, 2],\n        [0, 8, 6, 1, 9],\n        [1, 6, 9, 8, 6]]))', kwargs="{'mode': 'sum', 'per_sample_weights': tensor([[ 4.7383,  8.6016,  1.9072, -3.1719, -1.7842],\n        [-6.3008,  1.5293,  1.3008, -0.6592,  4.7539],\n        [ 5.0898, -4.1914, -1.9072,  3.4531, -8.3594],\n        [-6.3906, -0.9229,  7.9023,  7.1562, -2.5840],\n        [ 8.3125, -1.3008,  6.7773,  7.0234, -2.1719]], dtype=torch.float16)}", opset=18, sample_num=4) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 20 / 25 (80.0%)
Greatest absolute difference: 52.3125 at index (4, 3) (up to 1e-05 allowed)
Greatest relative difference: 152.75 at index (4, 3) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 1.9951, -1.1777, -3.7695, -3.3125,  8.5078],\n        [-3.9648, -3.2617,  4.5430, -6.7500,  1.1953],\n        [ 1.8193, -4.9297,  8.3438,  1.2217,  0.0352],\n        [-5.2812, -5.9414, -0.7295,  2.4785, -3.8496],\n        [ 7.2070, -0.1582,  3.8047,  1.9248, -1.8018]], dtype=torch.float16), tensor([2, 3, 1, 4, 3, 0]))', kwargs="{'mode': 'sum', 'offsets': tensor([0, 3, 6]), 'include_last_offset': True}", opset=18, sample_num=7) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 5 / 10 (50.0%)
Greatest absolute difference: 7.20703125 at index (1, 0) (up to 1e-05 allowed)
Greatest relative difference: 2.30859375 at index (1, 3) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 4.1836,  3.4453,  6.0039,  4.9062, -9.0000],\n        [ 3.1641, -2.9258,  4.8359,  0.7822,  0.6504],\n        [-4.6133, -2.5137, -3.4180, -5.9844,  0.8438],\n        [-1.5205, -6.5234, -2.6895, -8.7188,  1.0107],\n        [-5.0078,  7.6016, -5.2383, -1.7314,  7.2500],\n        [-2.9355, -7.4805,  6.5469,  1.1602,  3.8750],\n        [-8.1797,  3.0508,  3.5234, -5.3164,  6.2031],\n        [-5.4492, -0.7295,  1.6260, -0.4307, -5.7031],\n        [ 5.9414,  7.6719, -2.8301,  4.5352, -4.6328],\n        [ 4.3750,  1.8018,  3.6387,  3.2070,  7.5156]], dtype=torch.float16), tensor([9, 5, 3, 9, 5]))', kwargs="{'offsets': tensor([0, 3]), 'mode': 'sum', 'per_sample_weights': None}", opset=18, sample_num=13) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 10 / 10 (100.0%)
Greatest absolute difference: 7.515625 at index (1, 4) (up to 1e-05 allowed)
Greatest relative difference: 2.763671875 at index (1, 3) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-1.5469,  0.5801,  4.0078, -0.1143, -7.9023],\n        [ 1.2129, -3.5156, -2.5840,  1.3623, -2.8828],\n        [ 1.0547, -6.1797, -5.0625, -2.7695,  7.5234],\n        [ 2.5488,  8.5703, -8.9766, -2.8477,  2.0918],\n        [ 5.9492,  1.5029,  6.3555, -1.4766,  5.1172],\n        [ 5.8008, -1.0723,  6.9961, -3.4883, -1.0723],\n        [-1.6699, -3.6914,  8.6875,  5.3984, -1.3623],\n        [ 3.3320,  6.2305,  0.9580,  2.8906, -8.1250],\n        [-2.9004, -1.5732,  8.0625,  4.4570, -3.1016],\n        [ 5.2109,  0.3076, -6.9180,  2.1367, -6.8047]], dtype=torch.float16), tensor([2, 9, 6, 0, 1]))', kwargs="{'offsets': tensor([0, 0, 3]), 'mode': 'sum', 'per_sample_weights': None}", opset=18, sample_num=15) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 10 / 15 (66.7%)
Greatest absolute difference: 7.5234375 at index (0, 4) (up to 1e-05 allowed)
Greatest relative difference: 8.46875 at index (1, 0) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 7.1445,  1.1689,  4.9141, -2.8555,  1.4941],\n        [-5.0352, -7.7891, -2.3730,  2.7500,  4.1758],\n        [ 3.0312, -5.5977,  1.0986, -2.6895,  7.4688],\n        [ 1.7314,  6.1328,  7.9102,  0.1494, -2.7695],\n        [-5.0703, -5.9219, -3.1641,  5.6523, -6.1445],\n        [-5.2734, -7.5586,  5.0078,  1.5557,  3.8594],\n        [-3.1641, -3.0859,  8.0312,  7.6016, -4.9375],\n        [-2.3828,  1.1865,  8.3438, -2.4336, -8.5938],\n        [ 4.3594, -0.1055, -3.3828, -0.8350, -6.0391],\n        [ 1.3799,  4.6055,  5.5391,  7.1562, -1.2129]], dtype=torch.float16), tensor([[0, 0, 1, 3, 9],\n        [7, 0, 0, 3, 6],\n        [5, 3, 5, 5, 1],\n        [0, 4, 8, 2, 9],\n        [4, 0, 8, 4, 5]]))', kwargs="{'mode': 'sum', 'per_sample_weights': None}", opset=18, sample_num=16) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 10 / 25 (40.0%)
Greatest absolute difference: 7.15625 at index (0, 3) (up to 1e-05 allowed)
Greatest relative difference: 10.3671875 at index (3, 2) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 4.9219, -8.0391,  8.5938, -7.4258, -4.0703],\n        [ 0.6240,  7.5391,  3.8047,  8.8438,  2.5039],\n        [-4.4727, -8.3516,  5.2383, -4.6953,  2.8203],\n        [ 7.7969,  4.6758,  1.8281,  0.3691,  4.3750],\n        [-8.7500,  3.6035, -1.4854,  1.5820, -6.3281],\n        [ 8.4531,  5.3984, -2.1719,  2.2676,  3.2695],\n        [-1.3008,  6.5469, -1.4678,  3.3398, -1.2217],\n        [-3.6211, -8.2109,  0.5977, -3.5781,  4.5859],\n        [-8.8906,  5.4141, -7.0234, -5.3008, -7.5312],\n        [-0.8438,  3.3223,  2.5391, -0.8965, -8.1328]], dtype=torch.float16), tensor([[4, 6, 9, 3, 4],\n        [7, 6, 2, 1, 4],\n        [5, 0, 6, 6, 6],\n        [1, 2, 2, 5, 3],\n        [0, 6, 9, 4, 7]]))', kwargs="{'mode': 'sum', 'per_sample_weights': None}", opset=18, sample_num=17) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 10 / 25 (40.0%)
Greatest absolute difference: 8.140625 at index (0, 4) (up to 1e-05 allowed)
Greatest relative difference: 1.1552734375 at index (4, 4) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-0.4922, -5.5820, -3.4805, -7.4102, -6.9336],\n        [-7.3047,  7.1289, -3.8047,  0.5449, -6.0820],\n        [-4.0859, -3.8320,  5.8086, -7.9453,  6.9062],\n        [-5.9688, -4.3867, -7.0469,  5.3086, -6.9258],\n        [-8.0781,  1.8721,  4.5547, -3.6211,  4.9219]], dtype=torch.float16), tensor([4, 4, 0, 3, 1, 2]))', kwargs="{'mode': 'sum', 'offsets': tensor([0, 3, 6]), 'include_last_offset': True}", opset=18, sample_num=20) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 5 / 10 (50.0%)
Greatest absolute difference: 16.15625 at index (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 32.8125 at index (0, 0) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 4.7188,  0.8525,  3.0859, -6.3555, -2.1973],\n        [-3.7188,  7.5234, -4.6680, -4.1641, -8.5547],\n        [ 0.1406,  8.0781,  6.4141, -2.2227,  7.5391],\n        [-5.2188,  5.5703,  3.6289, -2.2324, -3.3320],\n        [ 3.3125,  8.5000, -2.1172, -1.5645,  3.6836],\n        [-7.4180,  3.1562,  7.7891, -6.6875, -2.9258],\n        [ 3.1465, -2.3125, -0.6768,  8.1328, -2.8906],\n        [-3.5078, -5.5273, -5.0352,  6.3008,  7.8203],\n        [ 2.7344,  5.2734, -4.5430,  2.2148, -4.5703],\n        [ 7.3477,  1.0283, -5.7383,  4.1328,  1.6260]], dtype=torch.float16), tensor([0, 1, 2, 9, 7]))', kwargs="{'offsets': tensor([0, 0, 3]), 'mode': 'mean', 'per_sample_weights': None}", opset=18, sample_num=28) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 10 / 15 (66.7%)
Greatest absolute difference: 6.35546875 at index (0, 3) (up to 1e-05 allowed)
Greatest relative difference: 1.5478515625 at index (2, 0) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-5.0625, -1.5029,  6.3984,  5.1406, -1.0547],\n        [ 8.5938,  3.4727, -4.8164, -5.7227,  8.3203],\n        [-1.5381, -5.1406,  0.1318, -7.8477,  0.9316],\n        [-4.8516, -7.1289,  2.4961,  7.2578,  6.3711],\n        [ 5.1875, -1.2744, -1.6787, -1.5820,  2.3477],\n        [ 2.4688,  3.1562,  6.8477, -1.4590, -2.4180],\n        [ 3.6738,  7.8828, -0.0176, -3.2344,  2.2578],\n        [-4.3594, -4.2109,  0.2900,  1.5117,  4.2734],\n        [-1.8984, -6.1172, -7.1992,  0.9316, -3.7695],\n        [-2.3730,  3.6562, -0.9229, -6.1523,  3.6484]], dtype=torch.float16), tensor([[5, 6, 9, 8, 8],\n        [1, 0, 9, 7, 7],\n        [7, 6, 5, 9, 9],\n        [8, 0, 3, 1, 7],\n        [3, 1, 2, 1, 2]]))', kwargs="{'mode': 'mean', 'per_sample_weights': None}", opset=18, sample_num=29) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 15 / 25 (60.0%)
Greatest absolute difference: 2.037109375 at index (2, 3) (up to 1e-05 allowed)
Greatest relative difference: 2.646484375 at index (0, 1) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-0.0439, -4.9062, -6.0312, -2.8750,  3.5684],\n        [ 6.8828, -0.9932,  4.7461,  2.8750,  2.4883],\n        [-7.1016, -6.2500, -1.7402, -7.1016,  8.5469],\n        [-6.6367,  8.0312, -8.3125,  8.5625,  6.9258],\n        [ 5.1250,  6.1094,  2.6367, -6.9336,  2.1523],\n        [ 0.7998, -8.6094,  6.8828,  8.9766,  5.2656],\n        [ 8.5312,  5.1758, -4.5156,  3.6289,  5.3359],\n        [-8.0312,  6.3438,  5.6602, -4.2539,  6.1953],\n        [-8.5703, -0.6943, -5.9844,  5.0000,  6.3281],\n        [ 0.3604,  4.1641,  6.6094,  1.7051, -0.7119]], dtype=torch.float16), tensor([[4, 8, 7, 3, 9],\n        [3, 1, 1, 9, 8],\n        [7, 6, 6, 3, 0],\n        [4, 9, 6, 7, 6],\n        [2, 3, 6, 0, 9]]))', kwargs="{'mode': 'mean', 'per_sample_weights': None}", opset=18, sample_num=30) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 20 / 25 (80.0%)
Greatest absolute difference: 2.349609375 at index (4, 2) (up to 1e-05 allowed)
Greatest relative difference: 7.3984375 at index (3, 2) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[ 7.5391,  0.8789, -7.8203, -8.7188,  0.7119],\n        [ 5.5469, -5.7109,  6.6523, -6.4258, -3.4531],\n        [ 2.4883,  0.4482,  5.1875,  8.3906,  5.2578],\n        [ 5.3359, -8.9297,  2.7422, -1.9951, -3.1289],\n        [ 8.7109,  3.6738, -7.3672,  7.8672,  7.8594]], dtype=torch.float16), tensor([3, 2, 3, 4, 2, 4]))', kwargs="{'mode': 'mean', 'offsets': tensor([0, 3, 6]), 'include_last_offset': True}", opset=18, sample_num=33) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 5 / 10 (50.0%)
Greatest absolute difference: 8.3671875 at index (1, 2) (up to 1e-05 allowed)
Greatest relative difference: 4.80078125 at index (1, 1) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-8.1250, -6.0469, -8.0156,  2.0566, -1.1865],\n        [-0.7471,  3.9648, -4.4297, -7.1445, -0.7207],\n        [-3.9375, -5.7305,  8.0469,  0.8438,  4.2344],\n        [-1.3096, -5.5820, -6.0039, -6.5391,  0.0176],\n        [-4.5273, -0.0264,  4.0000,  0.9492, -7.2148],\n        [-6.8633,  6.9961, -5.7109, -1.4150, -8.4297],\n        [-4.9219, -5.7500, -7.1562,  3.4980,  4.6836],\n        [ 0.9756,  7.7266, -7.2695, -1.9600,  5.4297],\n        [ 1.1426, -4.2617, -2.4609, -5.7031,  1.2920],\n        [ 8.0078, -1.6260,  8.8281, -1.9951,  3.6836]], dtype=torch.float16), tensor([0, 7, 7, 2, 5]))', kwargs="{'offsets': tensor([0, 0, 3]), 'mode': 'max', 'per_sample_weights': None}", opset=18, sample_num=41) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 5 / 15 (33.3%)
Greatest absolute difference: 8.125 at index (0, 0) (up to 1e-05 allowed)
Greatest relative difference: 1.0 at index (0, 0) (up to 0.001 allowed)
_ TestOnnxModelOutputConsistency_opset18CPU.test_output_match_nn_functional_embedding_bag_cpu_float16 (inputs='(tensor([[-7.5078, -2.3281, -5.4844,  2.7695, -6.6016],\n        [ 1.7227,  2.7773,  5.8438,  1.2480, -1.2920],\n        [ 7.2852, -7.4102, -8.2891,  4.7539, -7.2578],\n        [-8.7500,  6.2734,  5.6680, -4.7461, -4.7461],\n        [-6.0117,  8.3594, -4.6250, -7.4375,  5.2578],\n        [-7.4102,  6.8359,  4.8789, -0.6064,  3.6387],\n        [ 7.2695,  1.2480,  8.8828,  3.7969, -3.4453],\n        [ 3.3477,  0.4922, -5.0547,  7.6641,  7.9375],\n        [-0.5186, -7.7422, -3.4531, -0.2021, -6.2930],\n        [-6.4688,  0.7646,  8.5625, -0.8262,  4.7539]], dtype=torch.float16), tensor([[5, 4, 1, 4, 8],\n        [9, 8, 4, 3, 7],\n        [8, 1, 6, 7, 1],\n        [1, 0, 0, 6, 5],\n        [8, 9, 1, 0, 3]]))', kwargs="{'mode': 'max', 'per_sample_weights': None}", opset=18, sample_num=42) _
Traceback (most recent call last):
  File "/home/justinchu/dev/pytorch/test/onnx/test_fx_op_consistency.py", line 679, in _run_test_output_match
    test_suite.run_test_with_fx_to_onnx_exporter_and_onnx_runtime(
  File "<@beartype(onnx_test_common._TestONNXRuntime.run_test_with_fx_to_onnx_exporter_and_onnx_runtime) at 0x7f38c6425fc0>", line 254, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 302, in run_test_with_fx_to_onnx_exporter_and_onnx_runtime
    _compare_pytorch_onnx_with_ort(
  File "<@beartype(onnx_test_common._compare_pytorch_onnx_with_ort) at 0x7f38c6426320>", line 178, in _compare_pytorch_onnx_with_ort
  File "/home/justinchu/dev/pytorch/test/onnx/onnx_test_common.py", line 431, in _compare_pytorch_onnx_with_ort
    torch.testing.assert_close(
  File "/home/justinchu/dev/pytorch/torch/testing/_comparison.py", line 1520, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor-likes are not close!

Mismatched elements: 3 / 25 (12.0%)
Greatest absolute difference: 6.046875 at index (4, 4) (up to 1e-05 allowed)
Greatest relative difference: 4.6796875 at index (4, 4) (up to 0.001 allowed)
@justinchuby justinchuby added bug Something isn't working module: torchlib Related to the torch/aten function lib in development labels Sep 11, 2023
@xiaowuhu
Copy link
Contributor

Which environment? nightly?

@justinchuby
Copy link
Collaborator Author

justinchuby commented Sep 18, 2023

Edit: this is happening in the PyTorch tests

it can be run with

pytest pytorch/test/onnx/test_fx_op_consistency.py -k embedding

@xiaowuhu
Copy link
Contributor

xiaowuhu commented Sep 20, 2023

This test is targeting a)nn.functional.embedding_bag() or b)ops.aten.embedding_bag() function?
For a), return 1 tensor. For b), return 4 tensors.
Also,

  • if a), the mode argument value is 'sum', 'mean', 'max', and the 'offsets' argument can be None when 'indices' argument is 2D tensor.
  • if b), the mode argument value is 0, 1, 2
    Using input_wrangler cannot resolve this issue.

@justinchuby
Copy link
Collaborator Author

We can leverage the op-by-op verification feature in the exporter @titaiwangms created

@titaiwangms
Copy link
Contributor

I will follow up the threads and try to debug today

@titaiwangms titaiwangms self-assigned this Oct 26, 2023
@titaiwangms
Copy link
Contributor

Just realized the op-level-debugging doesn't work on this case because it needs index, which is not randomizable.

@titaiwangms
Copy link
Contributor

Checking SARIF to see if there is anything weird during conversion.

@titaiwangms
Copy link
Contributor

titaiwangms commented Oct 26, 2023

Edit: this is happening in the PyTorch tests

it can be run with

pytest pytorch/test/onnx/test_fx_op_consistency.py -k embedding

The op with issue is actually nn.functional.embedding_bag. The mismatch on nn.functional.embedding is not reproducible with nightly onnxscript and ort.

@titaiwangms
Copy link
Contributor

titaiwangms commented Oct 26, 2023

@xiaowuhu @justinchuby

This test is targeting a)nn.functional.embedding_bag() or b)ops.aten.embedding_bag() function? For a), return 1 tensor. For b), return 4 tensors. Also,

  • if a), the mode argument value is 'sum', 'mean', 'max', and the 'offsets' argument can be None when 'indices' argument is 2D tensor.
  • if b), the mode argument value is 0, 1, 2
    Using input_wrangler cannot resolve this issue.

Based on SARIF report (in test_fx_op_consistency.py set verbose=True, and activate TORCH_LOGS="onnx_diagnostics" when you run the test), the model uses ops.aten._embedding_bag_forward_only, and onnx dispatcher finds aten_embedding_bag_padding_idx.

In this case, I repro it with:

from core import aten_embedding_bag_padding_idx
import numpy as np    

def test_embedding_bag_onnx():
    import numpy as np
    # https://github.com/microsoft/onnxscript/issues/1056
    weight = np.array(
        [[-2.7199, -1.7691, -8.5981, -5.9605, -3.7100],
        [ 0.3334,  3.5580,  5.4002, -6.1015, -3.9192],
        [ 3.2690,  7.4735, -1.8522,  6.7348, -1.4507],
        [ 0.9523,  8.1493, -8.3490, -5.6658, -2.2785],
        [-3.5082,  7.7760, -5.8336, -4.1430, -6.2878],
        [-8.4290, -5.2537,  7.7364,  4.0160,  4.3621],
        [ 0.4733, -4.6142,  1.5227, -8.4033, -6.5031],
        [-4.6398,  5.6784,  5.2769, -3.9915, -0.3247],
        [ 5.7560,  8.9472,  3.5719,  1.2158,  6.0344],
        [-5.2992,  1.6771, -6.9777, -6.2378, -4.6493]],
         dtype=np.float16)
    indices = np.array([4, 9, 3, 0, 3], dtype=np.int64)
    offsets = np.array([0, 3], dtype=np.int64)
    mode = 0 # sum
    per_sample_weights = np.array([2.4134, -0.1783,  7.1360, -0.7987,  2.3815], dtype=np.float16)
    result = aten_embedding_bag_padding_idx(weight, indices, mode=mode, offsets=offsets, per_sample_weights=per_sample_weights)
    print("result from onnx-script:")
    print(result)

def test_embedding_bag_nn_function():
    import torch as t
    weight = t.tensor(
        [[-2.7199, -1.7691, -8.5981, -5.9605, -3.7100],
        [ 0.3334,  3.5580,  5.4002, -6.1015, -3.9192],
        [ 3.2690,  7.4735, -1.8522,  6.7348, -1.4507],
        [ 0.9523,  8.1493, -8.3490, -5.6658, -2.2785],
        [-3.5082,  7.7760, -5.8336, -4.1430, -6.2878],
        [-8.4290, -5.2537,  7.7364,  4.0160,  4.3621],
        [ 0.4733, -4.6142,  1.5227, -8.4033, -6.5031],
        [-4.6398,  5.6784,  5.2769, -3.9915, -0.3247],
        [ 5.7560,  8.9472,  3.5719,  1.2158,  6.0344],
        [-5.2992,  1.6771, -6.9777, -6.2378, -4.6493]],
         dtype=t.float16)

    indices = t.tensor([4, 9, 3, 0, 3],
         dtype=t.int64)
    offsets = t.tensor([0, 3], dtype=t.int64)
    mode = 0 # sum
    per_sample_weights = t.tensor([2.4134, -0.1783,  7.1360, -0.7987,  2.3815], dtype=t.float16)
    result = t.ops.aten._embedding_bag_forward_only(weight, indices, offsets=offsets, mode=mode, per_sample_weights=per_sample_weights)
    print("result from nn.functional:")
    print(result)


test_embedding_bag_onnx()
'''
result from onnx-script:
(array([[ -1.672,  76.94 , -73.7  , -50.44 , -31.44 ],
       [  4.44 ,  20.81 , -13.016,  -8.72 ,  -2.46 ]], dtype=float16), array([0, 0, 0, 0, 0], dtype=int64), array([0, 0], dtype=int64), array([0, 0], dtype=int64))
'''
test_embedding_bag_nn_function()
'''
result from nn.functional:
(tensor([[ -0.7275,  76.6250, -72.4375, -49.3125, -30.6250],
        [  4.4414,  20.8125, -13.0156,  -8.7266,  -2.4629]],
       dtype=torch.float16), tensor([], dtype=torch.int64), tensor([0, 0]), tensor([0, 0]))
'''

@justinchuby
Copy link
Collaborator Author

Thanks for doing the investigation! So it looks like we do need to implement them differently? Previously I thought they can be the same thing.

@xiaowuhu
Copy link
Contributor

xiaowuhu commented Nov 14, 2023

I found the root cause:

In the testing, we called result = aten_embedding_bag_padding_idx() instead of result = aten_embedding_bag(), so the result is different.
The default value padding_idx: int = -1 in aten_embedding_bag_padding_idx() function does have impaction on the result. So when you call
result = aten_embedding_bag_padding_idx( weight, indices, offsets, mode=mode, per_sample_weights=per_sample_weights )
means:
result = aten_embedding_bag_padding_idx( weight, indices, offsets, mode=mode, per_sample_weights=per_sample_weights, padding_idx=-1)
not equal to
result = aten_embedding_bag(weight, indices, offsets, mode=mode, per_sample_weights=per_sample_weights)

@justinchuby justinchuby added contribution welcome We welcome code contributions for this and removed contribution welcome We welcome code contributions for this labels Feb 8, 2024
@titaiwangms titaiwangms removed their assignment Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working contribution welcome We welcome code contributions for this module: torchlib Related to the torch/aten function lib in development
Projects
None yet
Development

No branches or pull requests

3 participants