Skip to content
Merged
1 change: 0 additions & 1 deletion docs/libcudacxx/extended_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ Extended API
extended_api/execution_model
extended_api/memory_model
extended_api/thread_groups
extended_api/shapes
extended_api/synchronization_primitives
extended_api/asynchronous_operations
extended_api/memory_access_properties
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ namely:
the behavior is undefined.
- If the objects are not of `TriviallyCopyable <https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable>`_
type the program is ill-formed, no diagnostic required.
- If *Shape* is :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-access-shapes-aligned-size>`, ``source``
- If *Shape* is :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-aligned-size>`, ``source``
and ``destination`` are both required to be aligned on ``cuda::aligned_size_t::align``, else the behavior is
undefined.
- If ``cuda::pipeline`` is in a *quitted state*
Expand All @@ -111,7 +111,7 @@ Template Parameters
- A type satisfying the [*Group*] concept.
* - ``Shape``
- Either `cuda::std::size_t <https://en.cppreference.com/w/c/types/size_t>`_
or :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-access-shapes-aligned-size>`.
or :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-aligned-size>`.

Parameters
----------
Expand Down
4 changes: 2 additions & 2 deletions docs/libcudacxx/extended_api/functional.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Functional
:maxdepth: 1

functional/proclaim_return_type
functional/get_device_address
memory/get_device_address
functional/maximum_minimum

.. list-table::
Expand Down Expand Up @@ -40,7 +40,7 @@ Functional
- CCCL 2.8.0
- CUDA 12.9

* - :ref:`cuda::get_device_address <libcudacxx-extended-api-functional-get-device-address>`
* - :ref:`cuda::get_device_address <libcudacxx-extended-api-memory-get-device-address>`
- Returns a valid address to a device object
- CCCL 2.8.0
- CUDA 12.9
12 changes: 12 additions & 0 deletions docs/libcudacxx/extended_api/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ Memory
:hidden:
:maxdepth: 1

memory/get_device_address
memory/aligned_size
memory/is_aligned
memory/align_up
memory/align_down
Expand All @@ -21,6 +23,16 @@ Memory
- **CCCL Availability**
- **CUDA Toolkit Availability**

* - :ref:`aligned_size_t <libcudacxx-extended-api-memory-aligned-size>`
- Defines an extent of bytes with a statically defined alignment.
- libcu++ 1.2.0 / CCCL 2.0.0
- CUDA 11.1

* - :ref:`get_device_address <libcudacxx-extended-api-memory-get-device-address>`
- Returns a valid address to a device object
- CCCL 2.8.0
- CUDA 12.9

* - :ref:`is_aligned <libcudacxx-extended-api-memory-is_aligned>`
- Check if a pointer is aligned
- CCCL 3.1.0
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. _libcudacxx-extended-api-memory-access-shapes-aligned-size:
.. _libcudacxx-extended-api-memory-aligned-size:

``cuda::aligned_size_t``
========================

Defined in headers ``<cuda/barrier>`` and ``<cuda/pipeline>``:
Defined in headers ``<cuda/memory>``, ``<cuda/barrier>`` and ``<cuda/pipeline>``:

.. code:: cuda

Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
.. _libcudacxx-extended-api-functional-get-device-address:
.. _libcudacxx-extended-api-memory-get-device-address:

``cuda::get_device_address``
============================

Defined in the header ``<cuda/functional>``:
Defined in the headers ``<cuda/memory>`` and ``<cuda/functional>``:

``cuda::get_device_address`` returns a valid pointer to a device object.
It replaces uses of ``cudaGetSymbolAddress``, which requires an inout parameter.
Expand All @@ -13,7 +13,7 @@ Example

.. code:: cuda

#include <cuda/functional>
#include <cuda/memory>

__device__ int device_object[] = {42, 1337, -1, 0};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Prefetch memory in the L2 cache starting at ``ptr`` applying a residence control

**Constraints**

- :ref:`ShapeT <libcudacxx-extended-api-memory-access-shapes>` is either ``size_t`` or :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-access-shapes-aligned-size>`.
- ``ShapeT`` is either ``size_t`` or :ref:`cuda::aligned_size_t <libcudacxx-extended-api-memory-aligned-size>`.
- Two properties are supported:

- :ref:`cuda::access_property::persisting <libcudacxx-extended-api-memory-access-properties-access-property-persisting>`
Expand Down
29 changes: 0 additions & 29 deletions docs/libcudacxx/extended_api/shapes.rst

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# pragma system_header
#endif // no system header

#include <cuda/__barrier/aligned_size.h>
#include <cuda/__memory/aligned_size.h>
#include <cuda/std/__algorithm/max.h>
#include <cuda/std/__cstddef/types.h>
#include <cuda/std/cstdint>
Expand Down
2 changes: 1 addition & 1 deletion libcudacxx/include/cuda/__memcpy_async/memcpy_async.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@

#if _CCCL_HAS_CUDA_COMPILER()

# include <cuda/__barrier/aligned_size.h>
# include <cuda/__barrier/async_contract_fulfillment.h>
# include <cuda/__barrier/barrier.h>
# include <cuda/__barrier/barrier_block_scope.h>
# include <cuda/__barrier/barrier_thread_scope.h>
# include <cuda/__memcpy_async/check_preconditions.h>
# include <cuda/__memcpy_async/memcpy_async_barrier.h>
# include <cuda/__memory/aligned_size.h>
# include <cuda/std/__atomic/scopes.h>
# include <cuda/std/__type_traits/void_t.h>
# include <cuda/std/cstddef>
Expand Down
2 changes: 1 addition & 1 deletion libcudacxx/include/cuda/__memcpy_async/memcpy_async_tx.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@
#if _CCCL_CUDA_COMPILATION()
# if __cccl_ptx_isa >= 800

# include <cuda/__barrier/aligned_size.h>
# include <cuda/__barrier/async_contract_fulfillment.h>
# include <cuda/__barrier/barrier_block_scope.h>
# include <cuda/__barrier/barrier_native_handle.h>
# include <cuda/__memcpy_async/check_preconditions.h>
# include <cuda/__memory/aligned_size.h>
# include <cuda/__ptx/instructions/cp_async_bulk.h>
# include <cuda/__ptx/ptx_dot_variants.h>
# include <cuda/__ptx/ptx_helper_functions.h>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA___BARRIER_ALIGNED_SIZE_H
#define _CUDA___BARRIER_ALIGNED_SIZE_H
#ifndef _CUDA___MEMORY_ALIGNED_SIZE_H
#define _CUDA___MEMORY_ALIGNED_SIZE_H

#include <cuda/std/detail/__config>

Expand Down Expand Up @@ -58,4 +58,4 @@ _LIBCUDACXX_END_NAMESPACE_CUDA

#include <cuda/std/__cccl/epilogue.h>

#endif // _CUDA___BARRIER_ALIGNED_SIZE_H
#endif // _CUDA___MEMORY_ALIGNED_SIZE_H
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
//
//===----------------------------------------------------------------------===//

#ifndef _CUDA___GET_DEVICE_ADDRESS_H
#define _CUDA___GET_DEVICE_ADDRESS_H
#ifndef _CUDA___MEMORY_GET_DEVICE_ADDRESS_H
#define _CUDA___MEMORY_GET_DEVICE_ADDRESS_H

#include <cuda/std/detail/__config>

Expand Down Expand Up @@ -55,4 +55,4 @@ _LIBCUDACXX_END_NAMESPACE_CUDA

#endif // _CCCL_HAS_CTK()

#endif // _CUDA___GET_DEVICE_ADDRESS_H
#endif // _CUDA___MEMORY_GET_DEVICE_ADDRESS_H
2 changes: 1 addition & 1 deletion libcudacxx/include/cuda/barrier
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
# error "CUDA synchronization primitives are only supported for sm_70 and up."
#endif // _CCCL_DEVICE_COMPILATION() && _CCCL_PTX_ARCH() < 700 && !_CCCL_CUDA_COMPILER(NVHPC)

#include <cuda/__barrier/aligned_size.h>
#include <cuda/__barrier/barrier.h>
#include <cuda/__barrier/barrier_arrive_tx.h>
#include <cuda/__barrier/barrier_block_scope.h>
Expand All @@ -34,6 +33,7 @@
#include <cuda/__barrier/barrier_thread_scope.h>
#include <cuda/__memcpy_async/memcpy_async.h>
#include <cuda/__memcpy_async/memcpy_async_tx.h>
#include <cuda/__memory/aligned_size.h>
#include <cuda/ptx>
#include <cuda/std/barrier>

Expand Down
2 changes: 1 addition & 1 deletion libcudacxx/include/cuda/functional
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@
#endif // no system header

#include <cuda/__functional/address_stability.h>
#include <cuda/__functional/get_device_address.h>
#include <cuda/__functional/maximum.h>
#include <cuda/__functional/minimum.h>
#include <cuda/__functional/proclaim_return_type.h>
#include <cuda/__memory/get_device_address.h>
#include <cuda/std/functional>

#endif // _CUDA_FUNCTIONAL_
3 changes: 2 additions & 1 deletion libcudacxx/include/cuda/memory
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,11 @@
# pragma system_header
#endif // no system header

#include <cuda/__functional/get_device_address.h>
#include <cuda/__memory/address_space.h>
#include <cuda/__memory/align_down.h>
#include <cuda/__memory/align_up.h>
#include <cuda/__memory/aligned_size.h>
#include <cuda/__memory/get_device_address.h>
#include <cuda/__memory/is_aligned.h>
#include <cuda/__memory/ptr_rebind.h>
#include <cuda/std/memory>
Expand Down
2 changes: 1 addition & 1 deletion libcudacxx/include/cuda/pipeline
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@
# pragma system_header
#endif // no system header

#include <cuda/__barrier/aligned_size.h>
#include <cuda/__memcpy_async/check_preconditions.h>
#include <cuda/__memcpy_async/completion_mechanism.h>
#include <cuda/__memcpy_async/memcpy_async_barrier.h>
#include <cuda/__memory/aligned_size.h>
#include <cuda/__ptx/instructions/get_sreg.h>
#include <cuda/atomic>
#include <cuda/barrier>
Expand Down
1 change: 0 additions & 1 deletion libcudacxx/include/cuda/std/barrier
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
#include <cuda/std/__barrier/poll_tester.h>

//! TODO: Drop cuda only features
#include <cuda/__barrier/aligned_size.h>
#include <cuda/__barrier/barrier.h>
#include <cuda/__barrier/barrier_arrive_tx.h>
#include <cuda/__barrier/barrier_block_scope.h>
Expand Down
Loading