-
Notifications
You must be signed in to change notification settings - Fork 221
Description
When my code uses an atomic "or" operation, the emitted PTX is invalid, and the module fails to load with just a simple "Failed to convert PTX to module: InvalidPtx".
When debugging this, I ran the emitted PTX file through ptxas and got this output:
ptxas ./target/release/build/.../kernels.ptx, line 13501; error : Operation .or requires .b32 or .b64 type for instruction 'atom'
The offending few lines of PTX are:
// begin inline asm
atom.relaxed.gpu.or.u32 %r32, [%rd13], %r33;
// end inline asm
Note that atom.relaxed.gpu.or.u32 is not actually a valid instruction. The correct instruction is atom.relaxed.gpu.or.b32.
The root cause is in crates/cuda_std/src/atomic/intrinsics.rs line 366:
macro_rules! atomic_fetch_op_3_reg {
($($ordering:ident, $op:ident, $width:literal, $type:ty, $scope:ident, $scope_asm:ident),* $(,)*) => {
$(
paste! {
#[$crate::gpu_only]
#[allow(clippy::missing_safety_doc)]
#[doc = concat!(
"Fetches the value in ptr, performs a ",
stringify!($op),
", and returns the original value"
)]
pub unsafe fn [<atomic_fetch_ $op _ $ordering _ $type _ $scope>](ptr: *mut $type, val: $type) -> $type {
let mut out;
asm!(
concat!(
"atom.",
ordering!($ordering),
stringify!($scope_asm),
".",
stringify!($op),
".",
ptx_type!($type),
" {}, [{}], {};"
),
out([<reg $width>]) out,
in(reg64) ptr,
in([<reg $width>]) val,
);
out
}
}
)*
};
}The input $type is u32 (correct), but this is incorrectly concatenated onto the end of the asm! instruction even when $op is or. (The ptx_type! macro simply returns $type unmodified)
I would assume all logical operations behave the same, so maybe there should be a atomic_fetch_op_3_reg_logical! macro?
Btw just wanted to say I've been having a blast with this library, my experience has been fantastic so far (aside from this one hiccup), I've successfully translated tens of thousands of lines of CUDA/C++ now, thank you so much for this!!