-
Notifications
You must be signed in to change notification settings - Fork 64
Embedding_bag mismatch #1056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Which environment? nightly? |
Edit: this is happening in the PyTorch tests it can be run with pytest pytorch/test/onnx/test_fx_op_consistency.py -k embedding |
This test is targeting a)nn.functional.embedding_bag() or b)ops.aten.embedding_bag() function?
|
We can leverage the op-by-op verification feature in the exporter @titaiwangms created |
I will follow up the threads and try to debug today |
Just realized the op-level-debugging doesn't work on this case because it needs index, which is not randomizable. |
Checking SARIF to see if there is anything weird during conversion. |
The op with issue is actually nn.functional.embedding_bag. The mismatch on nn.functional.embedding is not reproducible with nightly onnxscript and ort. |
Based on SARIF report (in In this case, I repro it with: from core import aten_embedding_bag_padding_idx
import numpy as np
def test_embedding_bag_onnx():
import numpy as np
# https://github.com/microsoft/onnxscript/issues/1056
weight = np.array(
[[-2.7199, -1.7691, -8.5981, -5.9605, -3.7100],
[ 0.3334, 3.5580, 5.4002, -6.1015, -3.9192],
[ 3.2690, 7.4735, -1.8522, 6.7348, -1.4507],
[ 0.9523, 8.1493, -8.3490, -5.6658, -2.2785],
[-3.5082, 7.7760, -5.8336, -4.1430, -6.2878],
[-8.4290, -5.2537, 7.7364, 4.0160, 4.3621],
[ 0.4733, -4.6142, 1.5227, -8.4033, -6.5031],
[-4.6398, 5.6784, 5.2769, -3.9915, -0.3247],
[ 5.7560, 8.9472, 3.5719, 1.2158, 6.0344],
[-5.2992, 1.6771, -6.9777, -6.2378, -4.6493]],
dtype=np.float16)
indices = np.array([4, 9, 3, 0, 3], dtype=np.int64)
offsets = np.array([0, 3], dtype=np.int64)
mode = 0 # sum
per_sample_weights = np.array([2.4134, -0.1783, 7.1360, -0.7987, 2.3815], dtype=np.float16)
result = aten_embedding_bag_padding_idx(weight, indices, mode=mode, offsets=offsets, per_sample_weights=per_sample_weights)
print("result from onnx-script:")
print(result)
def test_embedding_bag_nn_function():
import torch as t
weight = t.tensor(
[[-2.7199, -1.7691, -8.5981, -5.9605, -3.7100],
[ 0.3334, 3.5580, 5.4002, -6.1015, -3.9192],
[ 3.2690, 7.4735, -1.8522, 6.7348, -1.4507],
[ 0.9523, 8.1493, -8.3490, -5.6658, -2.2785],
[-3.5082, 7.7760, -5.8336, -4.1430, -6.2878],
[-8.4290, -5.2537, 7.7364, 4.0160, 4.3621],
[ 0.4733, -4.6142, 1.5227, -8.4033, -6.5031],
[-4.6398, 5.6784, 5.2769, -3.9915, -0.3247],
[ 5.7560, 8.9472, 3.5719, 1.2158, 6.0344],
[-5.2992, 1.6771, -6.9777, -6.2378, -4.6493]],
dtype=t.float16)
indices = t.tensor([4, 9, 3, 0, 3],
dtype=t.int64)
offsets = t.tensor([0, 3], dtype=t.int64)
mode = 0 # sum
per_sample_weights = t.tensor([2.4134, -0.1783, 7.1360, -0.7987, 2.3815], dtype=t.float16)
result = t.ops.aten._embedding_bag_forward_only(weight, indices, offsets=offsets, mode=mode, per_sample_weights=per_sample_weights)
print("result from nn.functional:")
print(result)
test_embedding_bag_onnx()
'''
result from onnx-script:
(array([[ -1.672, 76.94 , -73.7 , -50.44 , -31.44 ],
[ 4.44 , 20.81 , -13.016, -8.72 , -2.46 ]], dtype=float16), array([0, 0, 0, 0, 0], dtype=int64), array([0, 0], dtype=int64), array([0, 0], dtype=int64))
'''
test_embedding_bag_nn_function()
'''
result from nn.functional:
(tensor([[ -0.7275, 76.6250, -72.4375, -49.3125, -30.6250],
[ 4.4414, 20.8125, -13.0156, -8.7266, -2.4629]],
dtype=torch.float16), tensor([], dtype=torch.int64), tensor([0, 0]), tensor([0, 0]))
''' |
Thanks for doing the investigation! So it looks like we do need to implement them differently? Previously I thought they can be the same thing. |
I found the root cause: In the testing, we called |
The text was updated successfully, but these errors were encountered: