-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC for adding PTX and AMDGPU targets #1641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@TheAustinSeven Thanks for starting a discussion on this topic. I think the "Detailed Design" section is not detailed enough 😄. I have a few questions: How is this new target meant to be used? By calling Motivation related: How would one use these targets to build GPU programs? Hypothetical workflow: Use a The (partial) definition of the new Are there going to be multiple PTX targets to cover different gpu architectures and/or compute capabilities? (This is related to the previous question.) How does one access stuff like Is any part of Rust "fundamentally" not translatable to PTX? Annoying implementation details: Can I know nothing about AMDGPU so I can't comment about it. I personally would like to see some form of PTX target land in the compiler just to let people play with it. Then, they can tell us what else they need to build useful programs, libraries, etc. FWIW, AFAIK, the precedent is that new targets have not required an RFC to land in the compiler. cc @rust-lang/tools |
You bring up some great points and I will update the RFC to address as many as I can. I think that the new target here does merit an RFC because it is a fundamentally different target than all of the targets that exist now. Real quick just to give you an idea of what I will add:
|
In terms of implementation, LLVM provides intrinsics named llvm.cuda.syncthreads and llvm.nvvm.read.ptx.sreg.*. http://llvm.org/docs/NVPTXUsage.html#target-intrinsics. How to expose them to Rust is a different matter though. |
Big 👍 for getting Rust onto GPU! This is something I've also wanted several times already. However, I have doubts whether we can just slap a new target on it and be done with it. For starters, AFAIK the backends have very few users besides the respective device manufacturers (I only know of Google using PTX and Mesa using AMDGPU) and they all compile the same C-like language as the manufacturers do. So I would be surprised if we didn't run into stupid, unnecessary, but very real limitations in those backends as soon as we throw the LLVM IR we currently generate at them. Furthermore, GPUs have historically been pretty restrictive targets (e.g., no indirect calls = no function pointers and no trait objects). While this has improved (PTX 2.1 supports indirect calls) I bet that many people who want to write some Rust for some GPU will run into a limitation of this kind. The RFC wisely presents itself as an incremental step on a long road, but I think we should think sooner rather than later about how to model the various subsets of functionality. Fortunately, this question is already being discussed: https://internals.rust-lang.org/t/pre-rfc-a-vision-for-platform-architecture-configuration-specific-apis/3502 |
Also I just skimmed over the GCN 3 ISA and the SPIR-V spec and I can't find any evidence that indirect calls are supported (and I seem to recall the same was true of OpenCL 2.0 last time I checked). I guess it's possible that Nvidia has started supporting indirect calls but AMD hasn't. This would make the whole "account for device limitations" thing more urgent since you probably can't even compile libcore without generating indirect calls or at least vtables containing function pointers (though I suppose if you never use those code paths and LTO removes them it might work). |
What about Intel GPUs (two third of GPUs market share)? What about mobile GPUs, where the market is divided amongst 3/4 different vendors? Also, supporting GPUs means:
At this point there's so much special casing that in my opinion the only sane conclusion is that the Rust language isn't made for running on a GPU. A much saner alternative, however, would be to write a plugin that can parse and compile an individual Rust function (or multiple individual functions). Yet another thing that is blocked on plugins being stable. |
While this may not enable people to write software like Nvidia's VRworks or other graphics libraries, it is hard to argue that there is no use to adding these GPU targets. GPUs are currently used in many fields that don't involve graphics(Machine learning, Simulations, other HPC). I don't think this RFC should die on the idea that if it can't run on every GPU ever produced, then it isn't worth anything at all. Adding the AMDGPU and PTX targets allows people (like myself) who are involved in HPC to write Rust instead of C++ variants. As for the plugin, this would be the best first step in that direction. I would love to add a SPIR-V backend, however, there is not currently a SPIR-V backend in LLVM, so that would be a Herculean task intended for a team of people intimately familiar with the SPIR-V language. As I said in my RFC, I attempted to adapt the LLVM-SPIRV Bi-directional Translator, but it is too tightly intertwined with the OpenCL C architecture that it was impossible to run Rust through. I understand the concerns that you bring up, but I think we are thinking of different directions that people would take this in. I am thinking of HPC and you are thinking graphics. |
Just because I mentioned textures doesn't mean I'm thinking graphics. |
Sorry for the misunderstanding. I think that while the PTX and AMDGPU targets may start off without much of the support that CUDA or OpenCL C currently have, over time these could be added with language patches and external libraries. I honestly think the place for most of this should lie outside the core language, but the compiler itself should support compiling to these targets. |
This looks fine to me. I don't think it necessarily even requires an RFC since we add experimental architecture support all the time - as long as it doesn't have a significant maintenance impact. Since it's not obvious what form Rust-on-GPU will take there's going to be a lot of experimenting, and hopefully out of tree. The way I would expect this to proceed is for somebody to do the initial work in a fork to get a feel for what modifications are required, then upstream the basic target definitions and whatever compiler changes are necessary to get code gen working, then continue experimenting with the (presumably weird) library ecosystem out of tree. |
Thank you @tomaka for spelling out the (worst-case) implications of a GPU target. I agree that more restricted (older/smaller) GPUs are very interesting and useful targets. I don't think any of these issue need to be dealbreakers — after all, Rust is also quite viable for CPU-based embedded platforms with peculiar restrictions. Yes, you can't use all the fancy things that "normal Rust" has, but you still have a very nice language and can use any library that restricts itself to the capabilities of the platform. Or, put differently: nobody in their right mind would try to compile a complete Rust application with I/O, dynamic memory management, etc. to PTX and run it on a Titan. GPU backends would be used to implement computations in Rust that would otherwise be written in any of the other GPU-targeting languages. There's still a whole host-side application submitting the kernel to the device. Whether the crate boundary is the right unit of compilation for the kernels is relatively unimportant at this stage. All the headaches we're discussing here remain when you compile individual functions, the only difference is whether the end user puts the GPU-side code into the same file or in a different directory. To come back to this RFC specifically, I think first targeting feature-rich GPGPU devices is a good start, and that weaker devices should probably be added later as different targets. These targets are easier to hammer out since less design work (how to deal with all the restrictions) and implementation work (ready-made LLVM backends), and they allow people who only care about "compute capability X.Y" to use recursion and Finally, here's my take on the restrictions and how to deal with them:
Well, use
This is already an issue for operating systems and some embedded programs written in Rust, how do they handle it? Anyway, depending on what instruction set you target, there may be ways to kill the thread/warp/wavefront/workgroup (I know this is true in GCN 3 and all versions of PTX). If nothing like that exists, the target is probably so primitive that recursion and indirect calls aren't allowed, meaning you can just inline the program into a single function with static memory allocation and terminate via It's not ideal but we'll make do.
Yeah, not having such a fundamental language feature feels weird. Then again, people have long discussed the possibility of targets without floating point support, and disabling the primitive float types on those targets is a credible possibility. So I think that a target where you can't create trait objects is perfectly fine, and one can write a lot of very useful Rust code without any trait objects.
Similar issue to trait objects, see above. (And note that as with trait objects, everything using generics is still statically dispatched.)
Well, duh. See above for disabling language features.
Would they? The current model is that the compiler provides intrinsics and some attributes (i.e., those are available even without any crates, though you have to declare intrinsics to use them), and crates like
This certainly requires design work, but isn't particularly hard. Some intrinsics and attributes and perhaps lang items, with a little wrapper library for the intrinsics, would be my guess.
I am not quite sure what this entails. Don't many natural targets already provide this in some form, especially if you compiler to a reasonably high level language like GLSL or OpenCL C or even SPIR-V? (Man I should really go and properly learn SPIR-V instead of spot-checking the spec whenever I need to know something.) Regardless, this is just a little support library (right?) if the underlying primitives are exposed with intrinsics etc. — perhaps it has to be written, but it's just code. |
But why? From my experience, the biggest safety issue when writing a GPU program in general is the interface between the CPU and GPU. In order words, the CPU and GPU have to agree about how the data is aligned in memory. This RFC doesn't tackle that at all. On the other side, a plugin that is part of a library and that doesn't require a new compilation target, would. |
In order to build a plugin that would do this you would need a rust compiler capable of compiling to these targets anyways. You can't have a plugin that allows Rust on the GPU without a new compilation target. I would just like to clarify that this RFC is intended to add the ability to compile to GPU targets, but the expected use case would not extend beyond compiling several individual functions to PTX or AMDGPU. At that point there could be a library that adds everything extra you might want in such a target, but due to the limitations that such a target imposes, I don't think it is very realistic to think that such a task can be accomplished effectively without adding the target to the language.
I think that there are several reasons to use Rust. The first of which is that it is much easier to write both sides of a GPU-CPU program in the same language. There are plenty of reasons to want a Rust GPU backend, and of course there will be limitations just as there are with OpenCL C and CUDA, but I don't think any of those limitations are deal-breakers. |
I like Rust the language. Type inference, traits, macros, unboxed closures, and generics are all useful and require zero runtime support (and I must admit, I just plain find Rust more aesthetic, which can make the difference between me having enough motivation for a hobby project or not). I also like its standard library, but the core language plus selected bits and pieces from
If it takes in Rust code (in whatever chunk size, crate or function) and spits out code that runs on GPUs then it's by definition a compiler targeting GPU. Again, it doesn't really matter for the compiler where the Rust code comes from. Whether you collect the code to be compiled from a whole |
The property of a plugin compared to a new compilation target, is that a plugin could operate on a domain-specific and well-defined language that looks like Rust but is not exactly Rust, where things like panics and virtual function calls trigger compilation errors and where the additional required capabilities (like textures) would exist. Just like the people who designed CUDA and OpenCL chose not to use the C language, but a domain-specific well-defined language that looks like C but is not exactly C. |
In addition to this, an important point is that PTX, SPIR and SPIR-V are only intermediate representations. The consequence is that it's possible (and most importantly, sane) to translate the MIR output of the compiler directly to PTX/SPIR/SPIR-V. The only thing a plugin would need to do is run the Rust parser and trans, check that it doesn't contain anything invalid, and then translate to PTX/SPIR/SPIR-V instead of translating to LLVM IR. This is exactly the kind of thing that a plugin could perform. |
Good point. If it does turn out that significant language features would have to be added, one should rather design a dedicated language. This seems far from certain to me though. A laundry list of platform-specific intrinsics and attributes (perhaps with a convenience wrapper analogous to I think part of the reason why OpenCL C and CUDA are separate language is that Khronos/Nvidia don't have to power to add things to the C/C++ standards, so they create something that could either be called "a language inspired by C/C++" or "normal C/C++ with some parts ripped out and lots of compiler extensions". The latter perspective is particularly strong for CUDA which is basically a beefed-up clang with some compiler extensions for both host- and client-side code. But the C and C++ standards don't even support common restrictions and extensions needed by CPU programs (e.g., embedded programs, operating systems), such as
... with the consequence that compilers add some or all these restrictions and extensions without standard support (occasionally a de facto standard emerges but not always). Rust on the other hand already supports these restrictions and extensions, so why can't it also directly support another use case that was previously always shoved in third party dialects? This much for philosophical reasons why I think GPUs make fine compilation targets. I also have some technical points to nit-pick:
This is already the case when you compile a [0]: Killing the thread would be enough for soundness, but killing more would probably be fine and perhaps even helpful.
I don't think this is accurate. Yes, the code is also optimized by the driver, but in every toolchain I'm aware of, the PTX/SPIR-V/etc. is the output of a middle-end that performs extensive optimizations like LLVM does (in fact, all toolchains I know literally use LLVM — that's why the PTX and AMDGPU backends exist). Cutting out these optimizations is likely to degrade performance because the optimizer in the driver is almost certainly written to take pretty good IR and turn it into slightly better machine code. |
I should add that I'm not really opposed to the CUDA-style model, it may very well be the best interface. I just believe it's best to think of the contents of a Neither do I think we need to go through LLVM. MIR optimizations are coming and targeting OpenCL 1.0 or GLSL for reaching older devices. For experimenting with PTX, however, LLVM is quicker to get started and may also be the best option in the long run (e.g., Google uses it). |
See rust-lang/rust#34195 for a minimal implementation that generates PTX from Rust code. A few design questions arose while testing, see the PR for details. |
That looks good! I think that you nailed almost exactly what I was talking about on the PTX side of this RFC. I have been thinking quite a bit about this lately, and the more I think about it, the more I think that @brson was right. We should do a significant amount of experimentation and testing(especially testing) in a fork and upstream some of the changes somewhere down the line. @japaric if you are interested in working together to flesh this out, drop me an email [email protected]. I don't know if we should close this RFC, but I do think that the changes should be done in the way that @brson suggested. |
Please drop a link here once you have something to show for. I won't be much help with testing for lack of access to Nvidia chips, but at the very least I'd like to follow the design work (I imagine there's significant overlap with other GPU targets, to which I could and would like to contribute). |
I'm going to go ahead and close this. I think the immediate way forward is clear and doesn't require an approved RFC. Go ahead and start proving it out of tree and even upstream the basic definitions. When we get to a point where GPU support is going to impact the language definition and there's a clear design for how, then let's do another RFC. |
PTX and AMDGPU targets can be added fairly easily.