Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 #7748

bluejack · 2025-01-17T23:00:34Z

🐛 Describe the bug

After exporting a model to pte form and running it through executor_runner, I get:

E 00:00:02.220756 executorch:inputs_portable.cpp:45] Unsupported scalar type 5

I believe this is the "Half" type, or float16

Does that simply mean executor_runner does not support float16? Or does the whole framework not support float16?

Noted that when I run some investigation on the file using a python script, I get as far as sending it my float16 tensors, but it still fails to execute with a similar error:

[op_native_layer_norm.cpp:169] In function operator()(), assert failed (false): Unhandled dtype Half for native_layer_norm.out

I'm including the versions below, but note that this is using executorch built from head, rather than the last release. Should I expect the framework to support float16? And look to my own code for the error?

Versions

PyTorch version: 2.6.0.dev20250104
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 14.6.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.4)
CMake version: version 3.31.4
Libc version: N/A

Python version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct 4 2024, 08:22:19) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-14.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M3 Pro

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+cd0e584
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.0.0
[pip3] numpydoc==1.7.0
[pip3] torch==2.6.0.dev20250104
[pip3] torchao==0.8.0+git2e032c6b
[pip3] torchaudio==2.6.0.dev20250104
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0.dev20250104
[conda] executorch 0.6.0a0+cd0e584 pypi_0 pypi
[conda] numpy 2.0.0 pypi_0 pypi
[conda] numpydoc 1.7.0 py312hca03da5_0
[conda] torch 2.6.0.dev20250104 pypi_0 pypi
[conda] torchao 0.8.0+git2e032c6b pypi_0 pypi
[conda] torchaudio 2.6.0.dev20250104 pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.22.0.dev20250104 pypi_0 pypi

cc @larryliu0820 @manuelcandales

The text was updated successfully, but these errors were encountered:

Partial fix for #7748. ghstack-source-id: 0c7e0a5 ghstack-comment-id: 2599375147 Pull Request resolved: #7750

swolchok · 2025-01-17T23:37:46Z

Does that simply mean executor_runner does not support float16?

It looks like this particular function does not support float16. I've just sent #7750 to fix it.

Or does the whole framework not support float16?

We are capable of supporting it, but it looks like portable ops coverage is spotty. I'll send a fix for native_layer_norm and as many other places as I can find.

Partial fix for #7748. ghstack-source-id: 9f183dd ghstack-comment-id: 2599398274 Pull Request resolved: #7758

Partial fix for #7748. ghstack-source-id: a72e5e3 ghstack-comment-id: 2599413770 Pull Request resolved: #7760

swolchok · 2025-01-18T00:42:12Z

By the way, if you're running on your Mac, you might want to enable the XNNPACK delegate when exporting; there's a good chance you will get both better performance and a workaround for the remaining instance of this issue I haven't got PRs out for yet (though I don't know whether XNNPACK has layer norm off the top of my head).

Partial fix for #7748. ghstack-source-id: 02bfc58 ghstack-comment-id: 2599413770 Pull Request resolved: #7760

Partial fix for #7748. ghstack-source-id: b7b3380 ghstack-comment-id: 2599481711 Pull Request resolved: #7767

Partial fix for #7748. ghstack-source-id: 02a1dc7 ghstack-comment-id: 2599483099 Pull Request resolved: #7769

bluejack · 2025-01-18T05:51:40Z

By the way, if you're running on your Mac, you might want to enable the XNNPACK delegate when exporting; there's a good chance you will get both better performance and a workaround for the remaining instance of this issue I haven't got PRs out for yet (though I don't know whether XNNPACK has layer norm off the top of my head).

Ok, I will look at this option, thanks for the tip.

bluejack · 2025-01-20T17:46:38Z

To enable the XNNPACK delegate on export is it anything more than this:

edge_program = to_edge_transform_and_lower(
        exported_text,
        partitioner=[XnnpackPartitioner()],
        compile_config=EdgeCompileConfig(
            _check_ir_validity=True,
            _skip_dim_order=True,
        ),
    )

I replaced my basic to_edge call with this one, which I pulled from the xnnpack examples... but it does not seem to make any difference. Not sure if that's an indication that I am not actually doing the delegation properly, or whether it genuinely doesn't make a difference.

kimishpatel · 2025-01-21T16:01:54Z

I think you will probably want to apply recipe similar to llama stuff here?
Like for quantization it has to first do this https://github.com/pytorch/executorch/blob/main/examples/models/llama/export_llama_lib.py#L1041 and only 4bit quant option (8da4w works right now)
And then do XNNPACK "lowering" following code similar to https://github.com/pytorch/executorch/blob/main/examples/models/llama/export_llama_lib.py#L685.

SO in the above there are really two steps

4bit quant that requires some code from https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/quantize.py

And "lowering" that then identifies appropriate portions of the graph and leverages XNNPACK to execute them. That is the second link.

If you run into issues, which I expect you may, please post here and if you can paste the appropriate graph snippets/model from each stage it would help. @mcr229 and @digantdesai know a ton on this

kimishpatel · 2025-01-21T16:23:25Z

Also note that the code pointers above have largely been validated with llama3+ models, so if your model is similar that should likely enable using those utils.

Another thing worth mentioning is that, for language models 4-bit optimization works the best for now and is well supported which I linked above. But if thats not the case for your model and some variant of 8-bit quantization works better than I am gonna ask @mcr229 to point you to some code snippets to enable quantization and lowering for you.

bluejack · 2025-01-21T16:57:37Z

Thanks for these tips. Ours is a vision model, so eventual quality might suggest more than 4 bits, but at the moment we are just trying to get a proof of concept going. I'll dig in to these resources, thanks.

Partial fix for #7748. ghstack-source-id: c7d2a59 ghstack-comment-id: 2605368953 Pull Request resolved: #7791

Partial fix for #7748. ghstack-source-id: e25fec3 ghstack-comment-id: 2605391184 Pull Request resolved: #7792

swolchok · 2025-01-21T17:56:04Z

For anyone else who wants to jump in on op support, here is how I'm identifying ops to look at:

$ cd kernels/portable/cpu
$  rg --files-without-match 'Half|HALF|ALL|HBF16|HBBF16|ufunc'  -g '*.cpp' -g !test | sort

Still have to spot-check, but this gives an initial list.

Partial fix for #7748. ghstack-source-id: 9c6f758 ghstack-comment-id: 2605521459 Pull Request resolved: #7794

* Coerce to true_ctype in tensor_factory (pytorch#7856) This should fix the problem where attempts to test bool are often wonky in OSS and fail UBSAN internally; it is undefined behavior to store a value other than 0 or 1 for type bool. * Support Half/BFloat16 in prod operator (pytorch#7857) Partial fix for pytorch#7748.

Partial fix for pytorch#7748.

swolchok · 2025-02-07T23:32:50Z

closing because most things should support half/bfloat16 now. (norm ops are the exception per #7846; hoping to get to code sharing with PyTorch and then solve accuracy issues that way)

swolchok mentioned this issue Jan 17, 2025

Support Half/BFloat16 in runner_util/inputs_portable #7750

Merged

swolchok added a commit that referenced this issue Jan 17, 2025

Support Half/BFloat16 in runner_util/inputs_portable

d47abd4

Partial fix for #7748. ghstack-source-id: 0c7e0a5 ghstack-comment-id: 2599375147 Pull Request resolved: #7750

This was referenced Jan 17, 2025

Support Half/BFloat16 in native_layer_norm #7752

Merged

Support Half/BFloat16 in topk #7755

Merged

Support Half/BFloat16 in split_with_sizes. #7758

Merged

swolchok added a commit that referenced this issue Jan 18, 2025

Support Half/BFloat16 in split_with_sizes.

779eee1

Partial fix for #7748. ghstack-source-id: 9f183dd ghstack-comment-id: 2599398274 Pull Request resolved: #7758

swolchok self-assigned this Jan 18, 2025

swolchok changed the title ~~Unsupported Scalar Type 5?~~ Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 Jan 18, 2025

swolchok mentioned this issue Jan 18, 2025

Support Half/BFloat16 in abs/neg #7760

Merged

swolchok added a commit that referenced this issue Jan 18, 2025

Support Half/BFloat16 in abs/neg

ceacbee

Partial fix for #7748. ghstack-source-id: a72e5e3 ghstack-comment-id: 2599413770 Pull Request resolved: #7760

swolchok added a commit that referenced this issue Jan 18, 2025

Support Half/BFloat16 in abs/neg

98e83e0

Partial fix for #7748. ghstack-source-id: 02bfc58 ghstack-comment-id: 2599413770 Pull Request resolved: #7760

This was referenced Jan 18, 2025

Support Half/BFloat16 in op_allclose #7766

Merged

Support Half/BFloat16 in amax/amin #7767

Merged

swolchok added a commit that referenced this issue Jan 18, 2025

Support Half/BFloat16 in amax/amin

fbf28cd

Partial fix for #7748. ghstack-source-id: b7b3380 ghstack-comment-id: 2599481711 Pull Request resolved: #7767

swolchok mentioned this issue Jan 18, 2025

Support Half/BFloat16 in any #7769

Merged

swolchok added a commit that referenced this issue Jan 18, 2025

Support Half/BFloat16 in any

48ac4ee

Partial fix for #7748. ghstack-source-id: 02a1dc7 ghstack-comment-id: 2599483099 Pull Request resolved: #7769

swolchok mentioned this issue Jan 21, 2025

Support Half/BFloat16 in arange #7791

Merged

swolchok added a commit that referenced this issue Jan 21, 2025

Support Half/BFloat16 in arange

e14a3f3

Partial fix for #7748. ghstack-source-id: c7d2a59 ghstack-comment-id: 2605368953 Pull Request resolved: #7791

swolchok mentioned this issue Jan 21, 2025

Support Half/BFloat16 in argmax/argmin #7792

Merged

swolchok added a commit that referenced this issue Jan 21, 2025

Support Half/BFloat16 in argmax/argmin

e782a13

Partial fix for #7748. ghstack-source-id: e25fec3 ghstack-comment-id: 2605391184 Pull Request resolved: #7792

swolchok mentioned this issue Jan 21, 2025

Support Half/BFloat16 in avg_pool_2d #7794

Merged

swolchok added a commit that referenced this issue Jan 21, 2025

Support Half/BFloat16 in avg_pool_2d

e9435bc

Partial fix for #7748. ghstack-source-id: 9c6f758 ghstack-comment-id: 2605521459 Pull Request resolved: #7794

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in pdist_forward (pytorch#7852)

3ac6af6

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in split_with_sizes. (pytorch#7758)

80c9acd

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support BFloat16 in convolution_backward (pytorch#7807)

8e7b91e

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in runner_util/inputs_portable (pytorch#7750)

cb36e43

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in cdist (pytorch#7800)

0ab6e8a

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support BFloat16 in full_like (pytorch#7822)

e81f371

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in glu (pytorch#7824)

2bc93c0

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in ones (pytorch#7851)

4e9fb15

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in relu (pytorch#7858)

b59f027

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in roll (pytorch#7861)

0df9d44

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in select_scatter (pytorch#7865)

5fcc092

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in softmax (pytorch#7867)

d13c6b6

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in gelu (pytorch#7888)

ea35657

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in round (pytorch#7862)

435e8db

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in scatter (pytorch#7864)

c76e13e

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in sign (pytorch#7866)

02fc3c9

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in logit (pytorch#7890)

2331221

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in split_copy (pytorch#7901)

caf53f4

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in sum (pytorch#7897)

695cc6f

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in unbind_copy (pytorch#7908)

e80b0aa

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in hardtanh (pytorch#7899)

b77e3f5

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in stack (pytorch#7894)

85e177a

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in tril (pytorch#7893)

47d1011

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in var (pytorch#7913)

816b824

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in upsample_bilinear2d (pytorch#7910)

457b5a8

Partial fix for pytorch#7748.

zonglinpeng pushed a commit to zonglinpeng/executorch that referenced this issue Jan 30, 2025

Support Half/BFloat16 in upsample_nearest2d (pytorch#7911)

42659f8

Partial fix for pytorch#7748.

digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 4, 2025

swolchok closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 #7748

Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 #7748

bluejack commented Jan 17, 2025 •

edited by pytorch-bot bot

Loading

swolchok commented Jan 17, 2025

swolchok commented Jan 18, 2025 •

edited

Loading

bluejack commented Jan 18, 2025

bluejack commented Jan 20, 2025

kimishpatel commented Jan 21, 2025

kimishpatel commented Jan 21, 2025

bluejack commented Jan 21, 2025

swolchok commented Jan 21, 2025

swolchok commented Feb 7, 2025 •

edited

Loading

Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 #7748

Unsupported Scalar Type 5? -- Portable/optimized ops don't consistently support half/bfloat16 #7748

Comments

bluejack commented Jan 17, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

swolchok commented Jan 17, 2025

swolchok commented Jan 18, 2025 • edited Loading

bluejack commented Jan 18, 2025

bluejack commented Jan 20, 2025

kimishpatel commented Jan 21, 2025

kimishpatel commented Jan 21, 2025

bluejack commented Jan 21, 2025

swolchok commented Jan 21, 2025

swolchok commented Feb 7, 2025 • edited Loading

bluejack commented Jan 17, 2025 •

edited by pytorch-bot bot

Loading

swolchok commented Jan 18, 2025 •

edited

Loading

swolchok commented Feb 7, 2025 •

edited

Loading