Move aligned_size_t, get_device_address and discard_memory to cuda/__memory/#5239
Conversation
aligned_size_t and get_device_address to cuda/__memory/aligned_size_t, get_device_address and discard_memory to cuda/__memory/
fbusato
left a comment
There was a problem hiding this comment.
As the PR address <cuda/memory> documentation, could you please also add the documentation of https://github.com/NVIDIA/cccl/blob/main/libcudacxx/include/cuda/__memory/address_space.h?
|
/ok to test 6821384 |
🟨 CI finished in 2h 42m: Pass: 69%/205 | Total: 3d 15h | Avg: 25m 39s | Max: 1h 37m | Hits: 67%/156349
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 205)
| # | Runner |
|---|---|
| 128 | linux-amd64-cpu16 |
| 23 | windows-amd64-cpu16 |
| 14 | linux-amd64-gpu-h100-latest-1 |
| 14 | linux-amd64-gpu-rtxa6000-latest-1 |
| 12 | linux-arm64-cpu16 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
|
/ok to test 06b9ee1 |
|
/ok to test 4688831 |
🟩 CI finished in 4h 07m: Pass: 100%/205 | Total: 4d 01h | Avg: 28m 24s | Max: 1h 21m | Hits: 86%/337661
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 205)
| # | Runner |
|---|---|
| 128 | linux-amd64-cpu16 |
| 23 | windows-amd64-cpu16 |
| 14 | linux-amd64-gpu-h100-latest-1 |
| 14 | linux-amd64-gpu-rtxa6000-latest-1 |
| 12 | linux-arm64-cpu16 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
|
/ok to test 626899a |
🟩 CI finished in 1h 06m: Pass: 100%/205 | Total: 1d 16h | Avg: 11m 43s | Max: 55m 40s | Hits: 97%/337661
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 205)
| # | Runner |
|---|---|
| 128 | linux-amd64-cpu16 |
| 23 | windows-amd64-cpu16 |
| 14 | linux-amd64-gpu-h100-latest-1 |
| 14 | linux-amd64-gpu-rtxa6000-latest-1 |
| 12 | linux-arm64-cpu16 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
| - libcu++ 1.2.0 / CCCL 2.0.0 (in ``<cuda/memory>`` since CCCL 3.1.0) | ||
| - CUDA 11.1 | ||
|
|
||
| * - :ref:`cuda::discard_memory <libcudacxx-extended-api-memory-discard-memory>` |
There was a problem hiding this comment.
Q: should we remove the cuda:: prefix here? The other entries also omit it.
| * - :ref:`cuda::discard_memory <libcudacxx-extended-api-memory-discard-memory>` | |
| * - :ref:`discard_memory <libcudacxx-extended-api-memory-discard-memory>` |
There was a problem hiding this comment.
I am not sure, I would prefer to keep them, because is_address_from is in cuda::device:: namespace.. We should probably split the tables to cuda:: and cuda::device:: namespaces
There was a problem hiding this comment.
Ah, I see. Then I don't care and would prefer someone with more architectural knowledge decide how this should be structured.
There was a problem hiding this comment.
Let's open an issue about this and leave it for another PR
|
/ok to test e2ad943 |
🟩 CI finished in 1h 12m: Pass: 100%/205 | Total: 1d 13h | Avg: 10m 51s | Max: 43m 27s | Hits: 97%/337661
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 205)
| # | Runner |
|---|---|
| 128 | linux-amd64-cpu16 |
| 23 | windows-amd64-cpu16 |
| 14 | linux-amd64-gpu-h100-latest-1 |
| 14 | linux-amd64-gpu-rtxa6000-latest-1 |
| 12 | linux-arm64-cpu16 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
|
@fbusato please unblock |
|
/ok to test a4ac9aa |
🟩 CI finished in 3h 44m: Pass: 100%/205 | Total: 1d 15h | Avg: 11m 38s | Max: 3h 26m | Hits: 97%/337661
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| CCCL Packaging | |
| +/- | libcu++ |
| +/- | CUB |
| Thrust | |
| CUDA Experimental | |
| stdpar | |
| python | |
| CCCL C Parallel Library | |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| +/- | CCCL Packaging |
| +/- | libcu++ |
| +/- | CUB |
| +/- | Thrust |
| +/- | CUDA Experimental |
| +/- | stdpar |
| +/- | python |
| +/- | CCCL C Parallel Library |
| +/- | Catch2Helper |
🏃 Runner counts (total jobs: 205)
| # | Runner |
|---|---|
| 128 | linux-amd64-cpu16 |
| 23 | windows-amd64-cpu16 |
| 14 | linux-amd64-gpu-h100-latest-1 |
| 14 | linux-amd64-gpu-rtxa6000-latest-1 |
| 12 | linux-arm64-cpu16 |
| 11 | linux-amd64-gpu-rtx2080-latest-1 |
| 3 | linux-amd64-gpu-rtx4090-latest-1 |
…cuda/__memory/` (NVIDIA#5239) Co-authored-by: Michael Schellenberger Costa <miscco@nvidia.com>
Closes #997
This PR moves
aligned_size_t,get_device_addressanddiscard_memorytocuda/__memoryand makes them available in<cuda/memory>. I've also updated the documentation.