Skip to content

Atomic or implementation is buggy #347

@niklebedenko

Description

@niklebedenko

When my code uses an atomic "or" operation, the emitted PTX is invalid, and the module fails to load with just a simple "Failed to convert PTX to module: InvalidPtx".

When debugging this, I ran the emitted PTX file through ptxas and got this output:

ptxas ./target/release/build/.../kernels.ptx, line 13501; error   : Operation .or requires .b32 or .b64 type for instruction 'atom'

The offending few lines of PTX are:

	// begin inline asm
	atom.relaxed.gpu.or.u32 %r32, [%rd13], %r33;
	// end inline asm

Note that atom.relaxed.gpu.or.u32 is not actually a valid instruction. The correct instruction is atom.relaxed.gpu.or.b32.

The root cause is in crates/cuda_std/src/atomic/intrinsics.rs line 366:

macro_rules! atomic_fetch_op_3_reg {
    ($($ordering:ident, $op:ident, $width:literal, $type:ty, $scope:ident, $scope_asm:ident),* $(,)*) => {
        $(
            paste! {
                #[$crate::gpu_only]
                #[allow(clippy::missing_safety_doc)]
                #[doc = concat!(
                    "Fetches the value in ptr, performs a ",
                    stringify!($op),
                    ", and returns the original value"
                )]
                pub unsafe fn [<atomic_fetch_ $op _ $ordering _ $type _ $scope>](ptr: *mut $type, val: $type) -> $type {
                    let mut out;
                    asm!(
                        concat!(
                            "atom.",
                            ordering!($ordering),
                            stringify!($scope_asm),
                            ".",
                            stringify!($op),
                            ".",
                            ptx_type!($type),
                            " {}, [{}], {};"
                        ),
                        out([<reg $width>]) out,
                        in(reg64) ptr,
                        in([<reg $width>]) val,
                    );
                    out
                }
            }
        )*
    };
}

The input $type is u32 (correct), but this is incorrectly concatenated onto the end of the asm! instruction even when $op is or. (The ptx_type! macro simply returns $type unmodified)

I would assume all logical operations behave the same, so maybe there should be a atomic_fetch_op_3_reg_logical! macro?

Btw just wanted to say I've been having a blast with this library, my experience has been fantastic so far (aside from this one hiccup), I've successfully translated tens of thousands of lines of CUDA/C++ now, thank you so much for this!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions