-
Notifications
You must be signed in to change notification settings - Fork 553
Add more tests for sizeAdd node. #4258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
vanbasten23
commented
Dec 1, 2022
- Test SizeAdd::Lower function.
- Fix an issue with another existing test "test_simple_expand".
test/test_dynamic_shapes.py
Outdated
# Exercise SizeAdd::Lower. | ||
t4 = t3.expand(dyn_size) | ||
self.assertEqual(t4.size(0), 3) | ||
print(torch_xla._XLAC._get_xla_tensors_text([t4])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to test SizeAdd::ToString() method. Let me try to look further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the output of torch_xla._XLAC._get_xla_tensors_text([t4])
here? we want to do a self.assertIn
for the IR here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of torch_xla._XLAC._get_xla_tensors_text([t4])
here is
IR {
%0 = s32[] prim::Constant(), value=2
%1 = f32[] prim::Constant(), value=1
%2 = f32[1]{0} aten::view(%1), output_size=(1)
%3 = f32[] prim::Constant(), value=0
%4 = f32[5,2]{1,0} aten::expand(%3), size=(5, 2)
%5 = f32[1,2]{1,0} xla::generic_slice(%4), base_indices=(3, 0), sizes=(1, 2)
%6 = f32[2]{0} aten::view(%5), output_size=(2)
%7 = f32[2]{0} xla::update_slice(%6, %2), base_indices=(0)
%8 = f32[1,2]{1,0} aten::view(%7), output_size=(1, 2)
%9 = f32[5,2]{1,0} xla::update_slice(%4, %8), base_indices=(3, 0)
%10 = (s32[<=10,2]{1,0}, s32[]) aten::nonzero(%9), num_outputs=2
%11 = s32[] aten::size(%10.0)
%12 = s64[] aten::add(%11, %0)
%13 = f32[] prim::Constant(), value=1
%14 = f32[1]{0} aten::expand(%13), size=(1)
%15 = f32[<=12]{0} aten::expand(%14, %12), size=(12), dynamic_dims=(1), ROOT=0
}
It seems hard to test in the python code. But how about let me test in
Line 165 in 1685a28
TEST_F(IrTest, TestSizeAddNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just assert if f32[<=12]{0} aten::expand
is in the torch_xla._XLAC._get_xla_tensors_text([t4])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am trying to test is SizeAdd::ToString()
instead of expand
. I set SizeAdd::String()
to return "SizeAdd for op aten::add" but as you can see the string does not appear in torch_xla._XLAC._get_xla_tensors_text([t4])
.
So I don't think torch_xla._XLAC._get_xla_tensors_text([t4])
will exercise SizeAdd::ToString()
. That's why I suggest to test in test/cpp/test_ir.cpp
in this pr.
Ok, now the test that I added
The hlo and the ir graph. The full error message also doesn't show anymore useful info. Via gdb, I was able to break at xla/torch_xla/csrc/ops/dynamic_ir.cpp Line 90 in 1685a28
input1 has type s32 and input2 has type s64, hence the type mismatch. However, I couldn't find where input1 and input2 come from because p input1 doesn't reveal any useful info.
So my question is:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's dig a bit into the
======================================================================
ERROR: test_sizeAdd (__main__.TestDynamicShapes)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/pytorch/xla/test/test_dynamic_shapes.py", line 48, in test_sizeAdd
self.assertEqual(t4.size(0), 3)
File "/opt/conda/lib/python3.7/unittest/case.py", line 852, in assertEqual
assertion_func(first, second, msg=msg)
File "/opt/conda/lib/python3.7/unittest/case.py", line 842, in _baseAssertEqual
if not first == second:
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 212, in __bool__
return self.node.bool_()
RuntimeError: Error while lowering: SizeAdd for op aten::add
XLA builder error: INVALID_ARGUMENT: Binary op add with different element types: s32[] and s64[].:
We need to identify where the s32 coming from. Dumping the IR of t4
should help. Maybe we should make SizeAdd
to handle the casting.
I already did. Here is the IR of t4. Specifically
Both operands of aten::add are type s32. Via gdb, I was able to find #4258 (comment). |
@JackCaoG I followed your suggestion to create a shorter repo |
I will try to repo sometime today |
I am able to repo the error, will post my debugging steps tonight. |
After verify that expand is the source of the error, I dump the IR of the
weirdly both input seems to be s64, and this is the only
One possibility is that
and I saw
and if I do
so I was wrong, it is actually coming from |
looks suspious, if I have to guess, |
ok confirmed, if I do (which avoids
I am able to see
which suggest Size is actually a s32 even on CPU, but we force it to be S64 in xla/torch_xla/csrc/tensor_util.cpp Lines 1253 to 1259 in d7d0479
|
with
issue is gone. That being said
|
Here is the full diff
|
Thanks for looking into it. So to confirm, the |
my patch will force all size to be s32, as long as make sure |
But regarding your comment "looks suspious, if I have to guess, GetDimensionSize also returns s32, but we incorrectly flag it to be s64.", we don't have to change the part that flags it to be s64, right? |
no, we need to make sure dimensionType is always s32. However for the case of SizeAdd, if we have a |
Recap:
What we concluded is "Size is actually a s32 even on CPU, but we force it to be S64 in here. To confirm:
just so that it'll always return s32 for size operation? |
89475b0
to
70a9a79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly lgtm, minor nits