[Executorch][quant] Optimize per channel dequantize #5622

kimishpatel · 2024-09-25T04:42:24Z

Stack from ghstack (oldest at bottom):

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: D63338858

When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]

pytorch-bot · 2024-09-25T04:42:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5622

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 28 New Failures

As of commit bed6c96 with merge base b2517d6 ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-custom-ops-linux (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-custom-ops-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (bf16, buck2, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (bf16, cmake, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, buck2, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom+qe) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, cmake, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom+qe) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llama-runner-linux-android (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:133:5: error: unknown type name 'int32x4_t'; did you mean 'int32_t'?
pull / test-llama-runner-qnn-linux (fp32, cmake, qnn) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-llava-runner-linux / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (buck2, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (cmake, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (cmake, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (cmake, vit, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-models-linux (cmake, vit, xnnpack-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-pybind-build-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-quantized-aot-lib-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-selective-build-linux (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-selective-build-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / test-setup-linux-gcc (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: ‘const Tensor’ {aka ‘const class at::Tensor’} has no member named ‘dim_order’
pull / unittest / linux / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'
pull / unittest / macos / macos-job (gh)
/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:133:5: error: unknown type name 'int32x4_t'; did you mean 'int32_t'?
pull / unittest-arm (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: no member named 'dim_order' in 'at::Tensor'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-25T04:43:03Z

This pull request was exported from Phabricator. Differential Revision: D63338858

When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) ghstack-source-id: 244549440 Pull Request resolved: #5622

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2024

This was referenced Sep 25, 2024

[ExecuTorch] Some updated to kv cache #5615

Closed

Fix dequantize per channel to handle double scale type #5524

Closed

facebook-github-bot added the fb-exported label Sep 25, 2024

kimishpatel closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Executorch][quant] Optimize per channel dequantize #5622

[Executorch][quant] Optimize per channel dequantize #5622

Uh oh!

kimishpatel commented Sep 25, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 25, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 25, 2024

Uh oh!

Uh oh!

[Executorch][quant] Optimize per channel dequantize #5622

[Executorch][quant] Optimize per channel dequantize #5622

Uh oh!

Conversation

kimishpatel commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5622

❌ 28 New Failures

Uh oh!

facebook-github-bot commented Sep 25, 2024

Uh oh!

Uh oh!

kimishpatel commented Sep 25, 2024 •

edited

Loading

pytorch-bot bot commented Sep 25, 2024 •

edited

Loading