-
Notifications
You must be signed in to change notification settings - Fork 668
Do not use BNNS copy when dtypes differ in CoreML #13018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13018
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Cancelled Job, 6 PendingAs of commit 9dda55a with merge base 5d3550f ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
cc @cymbalrush can I can a sanity check here. Is BNNS expected to crash when the dtypes differ? |
It's not expected to crash, do you have an example model for which it crashes? We can merge the PR but it would be good to know for which datatype and layout it's crashing. |
Yes, you can consistently see the crash on this toy model here with floor_divide: #11714 The output dtype mismatch occurs because CoreML converts the dtype of floor_divide to float32 internally, but the output dtype in the exported program has dtype int64. |
d7d74f4
to
9dda55a
Compare
…13018)" Summary: the diff D79416945 make the model inference slow 1. in old 08/01 build runner on Mac , P1905141721 Prefilled 18 tokens @ 250 tokens/second. Generated 23 tokens @ 18.4 tokens/second. 2. in today 0814 build runner, on Mac, P1905142300 refilled 18 tokens @ 36.5112 token/s in 493ms Generated 23 tokens @ 2.25734 token/s in 10189ms Differential Revision: D80362730
BNNS copy crashes the process when the dtypes differ (pytorch#11714). With the example in this PR (pytorch#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread pytorch#1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame pytorch#1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame pytorch#2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame pytorch#3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame pytorch#4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame pytorch#5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame pytorch#6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame pytorch#7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame pytorch#8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame pytorch#9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame pytorch#10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame pytorch#11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame pytorch#12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame pytorch#13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame pytorch#14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame pytorch#15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.
BNNS copy crashes the process when the dtypes differ (#11714).
With the example in this PR (#11714), we crash the process on main. Here is the stack trace from LLDB:
With this PR, the process succeeds.