Skip to content

Fix cross-execution-space error: remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice#3286

Merged
hwu36 merged 1 commit into
NVIDIA:mainfrom
alexngUNC:fix/memsetdevice-host-only
Jun 11, 2026
Merged

Fix cross-execution-space error: remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice#3286
hwu36 merged 1 commit into
NVIDIA:mainfrom
alexngUNC:fix/memsetdevice-host-only

Conversation

@alexngUNC

Copy link
Copy Markdown
Contributor

Summary:

  • Remove CUTLASS_HOST_DEVICE from CudaHostAdapter::memsetDevice in include/cutlass/cuda_host_adapter.hpp.
  • memsetDevice calls the pure virtual host method memsetDeviceImpl, which implementations use to dispatch to cudaMemsetAsync / cuMemsetD*Async. It must remain host-only.
  • Fixes compile failures under --Werror cross-execution-space-call when memsetDevice is instantiated from __host__ __device__ code paths.

Context:

  • memsetDevice was host-only in 3.5.0 (#1411). In 3.5.1 (#1623), CUTLASS_HOST_DEVICE was added to copy/move/empty/size so CudaHostAdapter could be used in __host__ __device__ code. memsetDevice received the same macro by mistake even though it dispatches to virtual host APIs (memsetDeviceImpl → cudaMemsetAsync / cuMemsetD*Async).
  • Callers (zero_workspace, fill_workspace in workspace.h) are host-only.

Error observed when compiling with --Werror cross-execution-space-call:

error: calling a `__host__` function("memsetDeviceImpl")
from a `__host__ __device__` function("memsetDevice") is not allowed

Fixes downstream build failures (e.g. PyTorch rowwise FP8 CUTLASS on Windows with --Werror cross-execution-space-call).

cc @nkhasbag-nv

@hwu36 hwu36 merged commit 93774d3 into NVIDIA:main Jun 11, 2026
lramesh-2409 added a commit to lramesh-2409/pytorch that referenced this pull request Jun 12, 2026
lramesh-2409 added a commit to lramesh-2409/pytorch that referenced this pull request Jun 12, 2026
…utlass#3286)

The bump reorganized cutlass's CuTeDSL examples, so also update the
grouped_gemm.py path in cmake/FileMirroring.cmake to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants