Skip to content

Conversation

@nnethercote
Copy link
Collaborator

@nnethercote nnethercote commented Nov 23, 2025

#[address_space(shared)] static mut is used to model GPU shared memory. It's a bit weird. In particular, GPU shared memory is uninitialized, but static mut requires an initializer in Rust. gemm uses a zero initialize, but this initializer is ignored by NVVM. At least, it was in CUDA 12.x, but in CUDA 13.0 the gemm example fails with this error:

thread 'rustc' panicked at crates/rustc_codegen_nvvm/src/nvvm.rs:120:9:
Malformed NVVM IR program rejected by libnvvm, dumping verifier log:

error: Error: : Global Variable `_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE':
  context: @_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE = internal unnamed_addr addrspace(3) global <{ [1024 x i8] }> zeroinitializer, align 4
  Shared variables can't be initialized

This memory looks like it's initialized to zero but isn't, and then is written and read normally. This is incredibly dodgy and very likely UB. The proper way to deal with uninitialized memory in Rust is with MaybeUninit, and there are strict rules around its used, e.g. writes must be done with write and assume_init must be used values after they are written.

This commit changes gemm to use MaybeUninit for the shared memory. This fixes the error on CUDA 13.0 and the example runs correctly.

(This is the only executed use of GPU shared memory in rust-cuda. There is a shared_array! macro defined but it's only used in a compiletest where it is compiled but not run. That macro is extremely dubious but I will deal with it in a separate PR because it's not necessary to get CUDA 13.0 working.)

@nnethercote nnethercote requested a review from LegNeato November 23, 2025 23:39
@nnethercote nnethercote force-pushed the fix-gemm-in-cuda-13 branch 2 times, most recently from d535696 to 066df38 Compare November 24, 2025 02:28
@nnethercote nnethercote mentioned this pull request Nov 24, 2025
@LegNeato
Copy link
Contributor

@FractalFir can you review this?

@FractalFir
Copy link
Collaborator

FractalFir commented Nov 24, 2025

EDIT: was wrong about initialization of shared memory.

The PR is fine as is, but I think we need to rethink the abstraction over shared address space in general, to fix the Send soundless hole.

@nnethercote nnethercote mentioned this pull request Nov 25, 2025
@nnethercote
Copy link
Collaborator Author

What is the Send soundness hole?

I agree that the shared memory handling is iffy, but this PR is a clear improvement over the status quo.

`#[address_space(shared)] static mut` is used to model GPU shared
memory. It's a bit weird. In particular, GPU shared memory is
uninitialized, but `static mut` requires an initializer in Rust. `gemm`
uses a zero initialize, but this initializer is ignored by NVVM. At
least, it was in CUDA 12.x, but in CUDA 13.0 the `gemm` example fails
with this error:
```
thread 'rustc' panicked at crates/rustc_codegen_nvvm/src/nvvm.rs:120:9:
Malformed NVVM IR program rejected by libnvvm, dumping verifier log:

error: Error: : Global Variable `_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE':
  context: @_ZN12gemm_kernels10gemm_tiled10gemm_tiled6TILE_A17hc9c66e758c373a7eE = internal unnamed_addr addrspace(3) global <{ [1024 x i8] }> zeroinitializer, align 4
  Shared variables can't be initialized
```
This memory looks like it's initialized to zero but isn't, and then is
written and read normally. This is incredibly dodgy and very likely UB.
The proper way to deal with uninitialized memory in Rust is with
`MaybeUninit`, and there are strict rules around its used, e.g. writes
must be done with `write` and `assume_init` must be used values after
they are written.

This commit changes `gemm` to use `MaybeUninit` for the shared memory.
This fixes the error on CUDA 13.0 and the example runs correctly.

(This is the only executed use of GPU shared memory in rust-cuda. There
is a `shared_array!` macro defined but it's only used in a compiletest
where it is compiled but not run. That macro is extremely dubious but I
will deal with it in a separate PR because it's not necessary to get
CUDA 13.0 working.)
One of the nice things about using Rust for both CPU and GPU code is the
ability to share things between them. So let's do that in `gemm`.
@nnethercote
Copy link
Collaborator Author

I added a second commit to share TILE_SIZE in GPU and CPU code for gemm.

@LegNeato LegNeato merged commit 2891f7d into Rust-GPU:main Nov 25, 2025
11 checks passed
@nnethercote nnethercote deleted the fix-gemm-in-cuda-13 branch November 25, 2025 21:12
@FractalFir
Copy link
Collaborator

What is the Send soundness hole?

Sending a &T, which points to a shared memory in one thread block, to another thread block will make the pointer refer to different, potentially unrelated data. A pointer(or reference) to shared memory is only valid within the thread block it lives in. References are Send, so you can transfer them outside of their thread block, which is UB.

Kind of like how sending a & that points to thread-local memory would be unsound(which is why Rust has some special handling around TLS).

Your PR is an improvement, but I believe we should also do things like forbidding taking references to shared memory altogether. Just design work I did not have time to fully finish(I mention this whole mess in the Address spaces writeup).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants