Atomic or implementation is buggy

When my code uses an atomic "or" operation, the emitted PTX is invalid, and the module fails to load with just a simple "Failed to convert PTX to module: InvalidPtx".

When debugging this, I ran the emitted PTX file through `ptxas` and got this output:

```
ptxas ./target/release/build/.../kernels.ptx, line 13501; error   : Operation .or requires .b32 or .b64 type for instruction 'atom'
```

The offending few lines of PTX are:

```
	// begin inline asm
	atom.relaxed.gpu.or.u32 %r32, [%rd13], %r33;
	// end inline asm
```

Note that `atom.relaxed.gpu.or.u32` is not actually a valid instruction. The correct instruction is `atom.relaxed.gpu.or.b32`.

The root cause is in crates/cuda_std/src/atomic/intrinsics.rs line 366:

```rust
macro_rules! atomic_fetch_op_3_reg {
    ($($ordering:ident, $op:ident, $width:literal, $type:ty, $scope:ident, $scope_asm:ident),* $(,)*) => {
        $(
            paste! {
                #[$crate::gpu_only]
                #[allow(clippy::missing_safety_doc)]
                #[doc = concat!(
                    "Fetches the value in ptr, performs a ",
                    stringify!($op),
                    ", and returns the original value"
                )]
                pub unsafe fn [<atomic_fetch_ $op _ $ordering _ $type _ $scope>](ptr: *mut $type, val: $type) -> $type {
                    let mut out;
                    asm!(
                        concat!(
                            "atom.",
                            ordering!($ordering),
                            stringify!($scope_asm),
                            ".",
                            stringify!($op),
                            ".",
                            ptx_type!($type),
                            " {}, [{}], {};"
                        ),
                        out([<reg $width>]) out,
                        in(reg64) ptr,
                        in([<reg $width>]) val,
                    );
                    out
                }
            }
        )*
    };
}
```

The input `$type` is `u32` (correct), but this is incorrectly concatenated onto the end of the `asm!` instruction even when `$op` is `or`. (The `ptx_type!` macro simply returns `$type` unmodified)

I would assume all logical operations behave the same, so maybe there should be a `atomic_fetch_op_3_reg_logical!` macro?

Btw just wanted to say I've been having a blast with this library, my experience has been fantastic so far (aside from this one hiccup), I've successfully translated tens of thousands of lines of CUDA/C++ now, thank you so much for this!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Atomic or implementation is buggy #347

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Atomic or implementation is buggy #347

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions