API redesign with async support #18

juntyr · 2024-05-19T06:20:35Z

No description provided.

Experiments with Rust Futures Implemented derive for RustToCudaAsync Implemented async kernel launch Fixed RustToCudaAsync derive LaunchPackage with non-mut Stream Moved stream to be an explicit kernel argument Updated ExchangeWrapperOn[Device|Host]Async::move_to_stream Upgraded to fixed RustaCuda Added scratch-space methods for uni-directional CudaExchangeItem Added unsafe-aliasing API to SplitSlideOverCudaThreads[Const|Dynamic]Stride Extended the CudaExchangeItem API with scratch and uMaybeUninit Rename SplitSliceOverCudaThreads[Const|Dynamic]Strude::alias_[mut_]unchecked Implemented #[cuda(crate)] and #[kernel(crate)] attributes Added simple thread-block shared memory support Fixed device utils doc tests Convert cuda thread-block-shared memory address to generic First steps towards better shared memory, including dynamic Revert derive changes + R2C-based approach start Some progress on shared slices Backup of progress on compile-time PTX checking Clean up the PTX JIT implementation Add convenience functions for ThreadBlockShared arrays Improve and fix CI Remove broken ThreadBlockShared RustToCuda impl Refactor kernel trait generation to push more safety constraints to the kernel definition Fixed SomeCudaAlloc import Added error handling to the compile-time PTX checking Add PTX lint parsing, no actual support yet Added lint checking support to monomorphised kernel impls Improve kernel checking + added cubin dump lint Fix kernel macro config parsing Explicitly fitting Device[Const|Mut]Ref into device registers Switched one std:: to core:: Remove register-sized CUDA kernel args check, unnecessary since rust-lang/rust#94703 Simplified the kernel parameter layout extraction from PTX Fix up rebase issues Install CUDA in all CI steps Use CStr literals Simplify and document the safety traits Fix move_to_cuda bound Fix clippy for 1.76 Cleaned up the rust-cuda device macros with better print The implementation still uses String for dynamic formatting, which currently pulls in loads of formatting and panic machinery. While a custom String type that pre-allocated the exact format String length can avoid some of that, the formatting machinery even for e.g. usize is still large. If `format_args!` is ever optimised for better inlining, the more verbose and lower-level implementation could be reconsidered. Switch to using more vprintf in embedded CUDA kernel Make print example fully executable Clean up the print example ptr_from_ref is stable from 1.76 Exit on CUDA panic instead of abort to allow the host to handle the error Backup of early progress for switching from kernel traits to functions More work into kernel functions instead of traits Eliminate almost all ArgsTrait usages Some refactoring of the async kernel func type + wrap code Early sketch of extracting type wrapping from macro into types and traits Early work towards using trait for kernel type wrap, ptx jit workaround missing Lift complete CPU kernel wrapper from proc macro into public functions Add async launch helper Further cleanup of the new kernel param API Start cleaning up the public API Allow passing ThreadBlockShared to kernels again Remove unsound mutable lending to CUDA for now Allow passing ThreadBlockSharedSlice to kernel for dynamic shared memory Begin refactoring the public API with device feature Refactoring to prepare for better module structure Extract kernel module just for parameters Add RustToCuda impls for &T, &mut T, &[T], and &mut [T] where T: RustToCuda Large restructuring of the module layout for rust-cuda Split rust-cuda-kernel off from rust-cuda-derive Update codecov action to handle rust-cuda-kernel Fix clippy lint Far too much time spent getting rid of DeviceCopy More refactoring and auditing kernel param bounds First exploration towards a stricter async CUDA API More experiments with async API Further API experimentation Further async API experimentation Further async API design work Add RustToCudaAsync impls for &T and &[T], but not &mut T or &mut [T] Add back mostly unchanged exchange wrapper + buffer with RustToCudaAsync impls Add back mostly unchanged anti-aliasing types with RustToCudaAsync impls Progress on replacing ...Async with Async<...> Seal more implementation details Further small API improvements Add AsyncProj helper API struct for async projections Disable async derive in examples for now Implement RustToCudaAsync derive impls Further async API improvements to add drop behaviour First sketch of the safety constraints of a new NoSafeAliasing trait First steps towards reintroducing LendToCudaMut Fix no-std Box import for LendRustToCuda derive Re-add RustToCuda implementation for Final Remove redundant RustToCudaAsyncProxy More progress on less 'static bounds on kernel params Further investigation of less 'static bounds Remove 'static bounds from LendToCuda ref kernel params Make CudaExchangeBuffer Sync Make CudaExchangeBuffer Sync v2 Add AsyncProj proj_ref and proj_mut convenience methods Add RustToCudaWithPortableBitCloneSemantics adapter Fix invalid const fn bounds Add Deref[Mut] to the adapters Fix pointer type inference error Try removing __rust_cuda_ffi_safe_assert module Ensure async launch mutable borrow safety with barriers on use and stream move Fix uniqueness guarantee for Stream using branded types Try without ref proj Try add extract ref Fix doc link clean up kernel signature check Some cleanup before merging Fix some clippy lints, add FIXMEs for others Add docs for rust-cuda-derive Small refactoring + added docs for rust-cuda-kernel Bump MSRV to 1.77-nightly Try trait-based kernel signature check Try naming host kernel layout const Try match against byte literal for faster comparison Try with memcmp intrinsic Try out experimental const-type-layout with compression Try check Try check again

codecov-commenter · 2024-05-20T07:44:55Z

Codecov Report

Attention: Patch coverage is 0% with 2609 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (f395253) to head (9b7a875).

Files	Patch %	Lines
rust-cuda-kernel/src/kernel/link/mod.rs	0.00%	707 Missing ⚠️
rust-cuda-kernel/src/kernel/wrapper/mod.rs	0.00%	271 Missing ⚠️
...st-cuda-kernel/src/kernel/specialise/param_type.rs	0.00%	196 Missing ⚠️
src/utils/adapter.rs	0.00%	162 Missing ⚠️
rust-cuda-derive/src/rust_to_cuda/impl.rs	0.00%	135 Missing ⚠️
...kernel/wrapper/generate/host_link_macro/get_ptx.rs	0.00%	134 Missing ⚠️
...kernel/src/kernel/wrapper/generate/cuda_wrapper.rs	0.00%	121 Missing ⚠️
rust-cuda-kernel/src/kernel/lints.rs	0.00%	113 Missing ⚠️
rust-cuda-derive/src/rust_to_cuda/field_copy.rs	0.00%	111 Missing ⚠️
...src/kernel/wrapper/generate/host_link_macro/mod.rs	0.00%	97 Missing ⚠️
... and 20 more

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #18       +/-   ##
==========================================
- Coverage   58.39%   0.00%   -58.40%     
==========================================
  Files          48      33       -15     
  Lines        3653    3290      -363     
==========================================
- Hits         2133       0     -2133     
- Misses       1520    3290     +1770

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

juntyr and others added 9 commits May 19, 2024 06:05

Fix CUDA install in CI

e5708b6

Switch from kernel type signature check to random hash

0513483

Fix CI-identified failures

5846c1e

Use pinned nightly in CI

7d736d5

Try splitting the kernel func signature type check

e3d5a3f

Try with llvm-bitcode-linker

010179f

Upgrade to latest ptx-builder

3330436

Fix codecov by excluding ptx tests (codecov weirdly overrides linker)

9b7a875

juntyr mentioned this pull request May 20, 2024

Support async CUDA operations #10

Closed

juntyr merged commit eba6c37 into main May 20, 2024
6 checks passed

juntyr deleted the async-new branch May 20, 2024 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API redesign with async support #18

API redesign with async support #18

juntyr commented May 19, 2024

codecov-commenter commented May 20, 2024

API redesign with async support #18

API redesign with async support #18

Conversation

juntyr commented May 19, 2024

codecov-commenter commented May 20, 2024

Codecov Report