-
Notifications
You must be signed in to change notification settings - Fork 183
Regenerate cust_raw
for newer CUDA / support multiple versions
#166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently we have a checked-in generated bindings that seem to be for windows based on the In the usual bindgen flow, the user selects the version through whatever they have installed on their machine, and cust_raw (typically -sys) crate would generate bindings for that version. The cust crate can then configure itself through reading the version used to generate the bindings. Say we have support up to 12.8, and the user has 12.0 installed. cust should be able to support APIs up to 12.0 and disable newer once through its configuration. If the user has a newer version, then at best cust could still compile without any support for newer features/APIs, but the user should still be able to access them through the sys crate. At worst, we don't compile due to some backwards incompatible change with the newer version. |
Right, we need to decide if we generate and check in plus the hide the variants behind targets + features or we generate on the fly via |
I gave generating bindings at build time a shoot and have a WIP branch focusing on cuda 12 and up for now on Windows. I built with versions 12.0, 12.1, 12.3, and 12.8. The only breaking changes were in the graph api. 12.3 introduced conditional nodes, so cust checks support for that and sets a cfg so the wrapper enum can account for it. With these changes, cust is able to build for on every version I tested. 12.0 can't codegen thought, as libnvvm doesn't like our data layout for some reason, but 12.1 and up were ok with it. |
Hey @jorge-ortega , I'm also working on this and I have just saw your changes. I can help write the rest if something else is keeping your hands full or I can help test on Linux distributions on CUDA 12.8 if it's ready. |
Hey @adamcavendish, would be great if you could give the changes a try on linux. |
Hi @jorge-ortega, I've looked into your branch and I'd like to discuss a few considerations to refine the implementation of the bindgen-generated bindings. I know things are not finalized yet so there might be ideas that we think in common and you just didn't finish. I'm just raising these out for some discussions ahead of time.
Generating bindings directly into the Git workspace introduces unstaged files, which complicates The bindgen documentation recommends placing generated files in the
To align with Rust's ecosystem conventions for system bindings (e.g.,
This improves discoverability and adheres to crate naming standards for FFI crates. The current
To ensure compatibility across CUDA versions, we should mark generated enums (e.g., Example configuration for bindgen: let bindings = bindgen::Builder::default()
.header("wrapper.h")
// ... allowlist_types ...
.default_enum_style(bindgen::EnumVariation::Rust {
non_exhaustive: true,
}); The #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
#[repr(u8)]
#[non_exhaustive]
pub enum GraphNodeType {
// ... existing variants ...
}
impl GraphNodeType {
pub fn from_raw(raw: cuda::CUgraphNodeType) -> Self {
match raw {
// ... known variants ...
_ => GraphNodeType::Empty, // Fallback for unrecognized values
}
}
}
Instead of parsing CUDA versions from a JSON file, leverage the Additionally, use version-based feature flags (e.g., Example use cust_raw::CUDA_VERSION;
fn main() {
if CUDA_VERSION >= 12030 {
println!("cargo:rustc-cfg=cuda_12_3_0");
}
} |
Thanks @adamcavendish for providing this feedback, really appreciate it. I'm mostly in alignment with your suggestions.
Technically, we're not limited to a single version if we do pre-gen bindings. See cudarc (more on this below).
Since the SDK contains several different libraries, and different abstractions (driver vs runtime api) I was also considering consolidating bindings to a single crate (cuda-sdk-sys), and using features flags to allow users/downstream crates to pick which libraries to generate and link against. There might be some benefits with this (consolidated build logic, maybe better build times?) and I'm sure there are downsides I just haven't thought of yet.
cudarc@LegNeato has mentioned interest in collaborating with cudarc, which also wraps the cuda SDK libraries. Their abstractions are more mature then cust's and they support multiple versions of the SDK through multiple checked in bindings. I'm still surprised that all prior work I've found in this space seems to go that direction. Users have reported issues where the bindings fail layout tests on their platform, and the fix has been to disable layout tests, possibly hiding UB. For the moment this is the aspect I'd like to see changed if we do collaborate. Multiple LLVMsSince bindgen requires libclang 9+, we need users to have a more recent llvm install in addition to llvm 7 required by the codegen. This shouldn't be a major hurdle for most users, but does increase the complexity. We might also want to consider renaming environment variables to better distinguish which version should be set and for what. |
@jorge-ortega Thanks for providing more info about the roadmap.
If we do plan to switch over to cudarc, I can help make some changes to see how much work is needed to switch over. cc @LegNeato About the multiple checked-in bindings, yeah, I never thought of this path previously as many community crates are using the standard bindgen recommended ways.
I can further improve the current Dockerfiles for various Linux distribution supports when we have the exact environment variables settled down. Hope the distribution provided official LLVM versions are good enough for bindgen (Most of the time, yes). |
I just became aware of how to use cargo metadata to pass information to dependent packages.1 We could use this to pass the version information to higher level crates. This approach could potentially supersede most of what the find_cuda_helper crate provides. Footnotes |
Wow, that's awesome. This is what we really need. I can work on a refactor of this to make it cleaner. |
Sure, give it a try. I was looking at how openssl-sys passes version info as an example here. They hex encoded it and then parse it out to set their version cfgs. Haven't figured out exactly how they get the version information out of the headers yet. Has something to do with expanding version macros. It might be easier to read the version.json file instead? |
For the version.json file, it's not available in the Linux container. I guess it's available in your Windows setup but it's not cross platform? |
I should have known that wouldn't be cross-platform... I guess you could just bindgen and parse the version out from the bindings? |
Yeah, that's ok. May I cherry-pick your changes (so it'll show your Author information) and start with that? Before the merge, I'll write that we co-authored? |
Will be drafting in the branch feature/refactor-cust-raw-and-find-cuda and aim to be ready before next Wednesday. |
Hit me this morning how version ranges would work with version feature flags. It's a lot simpler than I made it out to be. I'm onboard with that approach now. |
@jorge-ortega Sure, thanks for viewing it, I'll continue on that track. |
Do we want to try to get |
Would be cool if we could get ownership of those crates transferred to us. If we do get them, our bindings should set a high bar of compatibility with what you would expect from the C/C++ code. Along those lines, I've been digging deeper into the headers to understand its versioning and per-thread stream options. I was interested in why from 11 to 12 there was a breaking change in some of the functions. The function names didn't show up in the docs, causing even further confusion. Turns out, there's a series of defines in the headers that renames the functions. Some rename to a more descriptive name, some to a specific version, and when CUDA_API_PER_THREAD_DEFAULT_STREAM is defined, to per-thread default/null stream variants. So, in the C/C++ code, when you call e.g. The simplest fix would be handwritten aliases, e.g #[link_name = "\u{1}cuCtxCreate_v2"]
pub fn cuCtxCreate(...) Note: With the per-thread settings, some functions also gain a |
The story just happens again like what NVidia did to cublas. When cublas v2 was put to production, they went to rename with |
If we go the maintain our own bindings route I assume we'll use https://github.com/dtolnay/cxx? |
I personally do not yet see any uses of cxx and we might still be using bindgen for most of the cases. cc @jorge-ortega if he has anything experimental in his branches. |
Haven't found a need for it just yet. Most of the host side API is just C, which bindgen can handle. What C++ specific host APIs I have found are templated wrappers over the C API. The cooperative groups and atomics are C++ but more device centric. Need to more information on how to handle that. |
@jorge-ortega @LegNeato For OptiX, I find it hard to include in the cust_raw package because the OptiX SDK requires users to get registered. It's hard to redistribute a docker image that includes the OptiX SDK if I'm right. I'll try to make something that builds on OptiX 9 or something that we originally built for and make a separate Wait ... I also wonder whether it is ok to commit the bindgen generated file to the repository since it's a kind of redistribution... If not, we should always ask user to have a SDK installed and bindgen from the SDK. |
That should be the goal for all the bindings to use the user's installed SDK to generate them. I got a working solution for the function renames today. It's based on macro expansion to check if a function name is macro defined, and if it is, expands a macro to an easy to parse line with the original function name and its macro expanded name. Pass that mapping to a custom bindgen callback, and it takes care of the rest. With that, several functions across cust lost their _v2 suffix. At the moment, this does unfortunately mean having to keep a list of hundreds of functions names to expand depending on the library. I do have another idea that would remove the need for that, but it would mean for each define bindgen finds, invoking the compiler to do another expansion, instead of doing one expansion with the current solution. |
Roger. Will try to move optix-sys to use |
With #181, we can now generate bindings for almost any version of CUDA and provides the foundation to add more coverage of the CUDA libraries. #187 helps match the bindings to the C/C++ API experience in addition to making the bindings more resilient across versions (and from NVidia's liberal use of macros). So what's left on this front, or can we close this out now? I think we still need to determine what versions Cust will officially support, as we've now dropped v11 from CI altogether now. |
Yeah, I think CUDA versions shall align with major ML frameworks and render engines, e.g. PyTorch, Tensorflow, Jax, Blender Cycles, etc. |
I'm fine dropping 11 as we likely have no users on it. |
Let's close |
Now that we are adding support for newer CUDA (#100), we need a way to select the version for
cust_raw
's bindgen (and elsewhere?) See:Rust-CUDA/crates/cust_raw/README.md
Line 6 in 400f816
The text was updated successfully, but these errors were encountered: