[NVPTX] Lower 16xi8 and 8xi8 stores efficiently by bondhugula · Pull Request #73646 · llvm/llvm-project

bondhugula · 2023-11-28T13:54:40Z

Lower 16xi8 vector stores in NVPTX ISel efficiently using
st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads
and 8xf16. Similarly, 8xi8 using st.v2.u32.

ldrumm

Minor nits. LGTM

llvm/test/CodeGen/NVPTX/vector-stores.ll

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Artem-B · 2023-11-29T18:23:05Z

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Nice. Legalizer assuming that stack loads/stores are cheap is indeed a rather bad misoptimization for NVPTX.

Note that this comment might be out of date, as it looks copied from PerformLOADCombine and that was written before stack optimizations were done

Lower 16xi8 vector stores in NVPTX ISel efficiently using st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads and 8xf16. Similarly, 8xi8 using st.v2.u32.

steven-johnson · 2023-12-05T19:07:30Z

This seems to have injected failures into Halide codegen; we are now getting runtime errors of the form CUDA_ERROR_MISALIGNED_ADDRESS for cuMemcpyDtoH() where we didn't before. It appears we are now emitting an aligned store instruction where we previous emitted an unaligned one. Can we get a revert of this pending further investigation, please?

steven-johnson · 2023-12-05T19:09:55Z

llvm/test/CodeGen/NVPTX/vector-stores.ll

+; CHECK-LABEL: .visible .func v8i8_store
+define void @v8i8_store(ptr %a, <8 x i8> %v) {
+  ; CHECK: st.v2.u32
+  store <8 x i8> %v, ptr %a


This is only correct if the pointer is aligned to a 4-byte-boundary (IIUC), but AFAIK nothing in the IR to this point promises that alignment

You're right. Using larger types for loads/stores must be aligned appropriately.

We do use allowsMemoryAccessForAlignment in other places.

In that case, we should revert it if a fix-forward is not imminent (this is breaking all of Halide's Cuda tests).

This reverts commit 173fcf7. Needs to constrain the optimization to properly aligned loads/stores only. llvm#73646 (comment)

…4518) This reverts commit 173fcf7. We need to constrain the optimization to properly aligned loads/stores only. #73646 (comment)

pasaulais

LGTM once the alignment issue is addressed

bondhugula requested review from Artem-B and pasaulais November 28, 2023 13:54

bondhugula mentioned this pull request Nov 28, 2023

[NVPTX] Preserve v16i8 vector loads when legalizing #67322

Closed

bondhugula force-pushed the uday/nvptx_v16i8_vector_store branch 2 times, most recently from 180ee21 to 9d747dd Compare November 28, 2023 14:14

ldrumm approved these changes Nov 28, 2023

View reviewed changes

llvm/test/CodeGen/NVPTX/vector-stores.ll Outdated Show resolved Hide resolved

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp Outdated Show resolved Hide resolved

bondhugula force-pushed the uday/nvptx_v16i8_vector_store branch 2 times, most recently from 6ab9db7 to 298e563 Compare November 29, 2023 06:14

Artem-B approved these changes Nov 29, 2023

View reviewed changes

[NVPTX] Lower 16xi8 and 8xi8 stores efficiently

c197301

Lower 16xi8 vector stores in NVPTX ISel efficiently using st.v4.b32 instead of multiple st.v4.u8 along the lines of vector loads and 8xf16. Similarly, 8xi8 using st.v2.u32.

bondhugula force-pushed the uday/nvptx_v16i8_vector_store branch from 298e563 to c197301 Compare November 30, 2023 02:26

bondhugula merged commit 173fcf7 into llvm:main Dec 1, 2023

steven-johnson reviewed Dec 5, 2023

View reviewed changes

Artem-B mentioned this pull request Dec 5, 2023

Revert "[NVPTX] Lower 16xi8 and 8xi8 stores efficiently (#73646)" #74518

Merged

Artem-B added a commit that referenced this pull request Dec 6, 2023

Revert "[NVPTX] Lower 16xi8 and 8xi8 stores efficiently (#73646)" (#7…

a2d3bb1

…4518) This reverts commit 173fcf7. We need to constrain the optimization to properly aligned loads/stores only. #73646 (comment)

pasaulais reviewed Dec 13, 2023

View reviewed changes

dakersnar mentioned this pull request Dec 5, 2024

[NVPTX] Incomplete and inconsistent mechanisms for lowering vectors loads/stores with sub-32-bit values #118851

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVPTX] Lower 16xi8 and 8xi8 stores efficiently#73646

[NVPTX] Lower 16xi8 and 8xi8 stores efficiently#73646
bondhugula merged 1 commit intollvm:mainfrom
bondhugula:uday/nvptx_v16i8_vector_store

bondhugula commented Nov 28, 2023

Uh oh!

ldrumm left a comment

Uh oh!

Uh oh!

Uh oh!

Artem-B Nov 29, 2023

Uh oh!

pasaulais Dec 13, 2023

Uh oh!

steven-johnson commented Dec 5, 2023

Uh oh!

steven-johnson Dec 5, 2023

Uh oh!

Artem-B Dec 5, 2023

Uh oh!

steven-johnson Dec 5, 2023

Uh oh!

pasaulais left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

bondhugula commented Nov 28, 2023

Uh oh!

ldrumm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Artem-B Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

pasaulais Dec 13, 2023

Choose a reason for hiding this comment

Uh oh!

steven-johnson commented Dec 5, 2023

Uh oh!

steven-johnson Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

Artem-B Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

steven-johnson Dec 5, 2023

Choose a reason for hiding this comment

Uh oh!

pasaulais left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants