Fix `gemm` example on CUDA 13.0. #317

nnethercote · 2025-11-23T23:38:51Z

#[address_space(shared)] static mut is used to model GPU shared memory. It's a bit weird. In particular, GPU shared memory is uninitialized, but static mut requires an initializer in Rust. gemm uses a zero initialize, but this initializer is ignored by NVVM. At least, it was in CUDA 12.x, but in CUDA 13.0 the gemm example fails with this error:

thread 'rustc' panicked at crates/rustc_codegen_nvvm/src/nvvm.rs:120:9:
Malformed NVVM IR program rejected by libnvvm, dumping verifier log:

error: Error: : Global Variable `_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE':
  context: @_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE = internal unnamed_addr addrspace(3) global <{ [1024 x i8] }> zeroinitializer, align 4
  Shared variables can't be initialized

This memory looks like it's initialized to zero but isn't, and then is written and read normally. This is incredibly dodgy and very likely UB. The proper way to deal with uninitialized memory in Rust is with MaybeUninit, and there are strict rules around its used, e.g. writes must be done with write and assume_init must be used values after they are written.

This commit changes gemm to use MaybeUninit for the shared memory. This fixes the error on CUDA 13.0 and the example runs correctly.

(This is the only executed use of GPU shared memory in rust-cuda. There is a shared_array! macro defined but it's only used in a compiletest where it is compiled but not run. That macro is extremely dubious but I will deal with it in a separate PR because it's not necessary to get CUDA 13.0 working.)

LegNeato · 2025-11-24T17:06:27Z

@FractalFir can you review this?

FractalFir · 2025-11-24T17:22:00Z

EDIT: was wrong about initialization of shared memory.

The PR is fine as is, but I think we need to rethink the abstraction over shared address space in general, to fix the Send soundless hole.

nnethercote · 2025-11-25T02:25:04Z

What is the Send soundness hole?

I agree that the shared memory handling is iffy, but this PR is a clear improvement over the status quo.

`#[address_space(shared)] static mut` is used to model GPU shared memory. It's a bit weird. In particular, GPU shared memory is uninitialized, but `static mut` requires an initializer in Rust. `gemm` uses a zero initialize, but this initializer is ignored by NVVM. At least, it was in CUDA 12.x, but in CUDA 13.0 the `gemm` example fails with this error: ``` thread 'rustc' panicked at crates/rustc_codegen_nvvm/src/nvvm.rs:120:9: Malformed NVVM IR program rejected by libnvvm, dumping verifier log: error: Error: : Global Variable `_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE': context: @_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE = internal unnamed_addr addrspace(3) global <{ [1024 x i8] }> zeroinitializer, align 4 Shared variables can't be initialized ``` This memory looks like it's initialized to zero but isn't, and then is written and read normally. This is incredibly dodgy and very likely UB. The proper way to deal with uninitialized memory in Rust is with `MaybeUninit`, and there are strict rules around its used, e.g. writes must be done with `write` and `assume_init` must be used values after they are written. This commit changes `gemm` to use `MaybeUninit` for the shared memory. This fixes the error on CUDA 13.0 and the example runs correctly. (This is the only executed use of GPU shared memory in rust-cuda. There is a `shared_array!` macro defined but it's only used in a compiletest where it is compiled but not run. That macro is extremely dubious but I will deal with it in a separate PR because it's not necessary to get CUDA 13.0 working.)

One of the nice things about using Rust for both CPU and GPU code is the ability to share things between them. So let's do that in `gemm`.

nnethercote · 2025-11-25T02:53:30Z

I added a second commit to share TILE_SIZE in GPU and CPU code for gemm.

FractalFir · 2025-11-30T18:24:37Z

What is the Send soundness hole?

Sending a &T, which points to a shared memory in one thread block, to another thread block will make the pointer refer to different, potentially unrelated data. A pointer(or reference) to shared memory is only valid within the thread block it lives in. References are Send, so you can transfer them outside of their thread block, which is UB.

Kind of like how sending a & that points to thread-local memory would be unsound(which is why Rust has some special handling around TLS).

Your PR is an improvement, but I believe we should also do things like forbidding taking references to shared memory altogether. Just design work I did not have time to fully finish(I mention this whole mess in the Address spaces writeup).

nnethercote requested a review from LegNeato November 23, 2025 23:39

nnethercote force-pushed the fix-gemm-in-cuda-13 branch 2 times, most recently from d535696 to 066df38 Compare November 24, 2025 02:28

nnethercote mentioned this pull request Nov 24, 2025

Remove shared_array! #318

Merged

nnethercote mentioned this pull request Nov 25, 2025

Support CUDA 13 #299

Closed

nnethercote force-pushed the fix-gemm-in-cuda-13 branch from 066df38 to 2a28b70 Compare November 25, 2025 02:35

Share TILE_SIZE.

2f31d7c

One of the nice things about using Rust for both CPU and GPU code is the ability to share things between them. So let's do that in `gemm`.

LegNeato approved these changes Nov 25, 2025

View reviewed changes

LegNeato merged commit 2891f7d into Rust-GPU:main Nov 25, 2025
11 checks passed

nnethercote deleted the fix-gemm-in-cuda-13 branch November 25, 2025 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `gemm` example on CUDA 13.0. #317

Fix `gemm` example on CUDA 13.0. #317

Uh oh!

nnethercote commented Nov 23, 2025 •

edited

Loading

Uh oh!

LegNeato commented Nov 24, 2025

Uh oh!

FractalFir commented Nov 24, 2025 •

edited

Loading

Uh oh!

nnethercote commented Nov 25, 2025

Uh oh!

nnethercote commented Nov 25, 2025

Uh oh!

Uh oh!

FractalFir commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix gemm example on CUDA 13.0. #317

Fix gemm example on CUDA 13.0. #317

Uh oh!

Conversation

nnethercote commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented Nov 24, 2025

Uh oh!

FractalFir commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nnethercote commented Nov 25, 2025

Uh oh!

nnethercote commented Nov 25, 2025

Uh oh!

Uh oh!

FractalFir commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `gemm` example on CUDA 13.0. #317

Fix `gemm` example on CUDA 13.0. #317

nnethercote commented Nov 23, 2025 •

edited

Loading

FractalFir commented Nov 24, 2025 •

edited

Loading