Skip to content

Add event timing #481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 6, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 31 additions & 2 deletions cuda_core/cuda/core/experimental/_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,26 @@ class Event:
the last recorded stream.

Events can be used to monitor device's progress, query completion
of work up to event's record, and help establish dependencies
between GPU work submissions.
of work up to event's record, help establish dependencies
between GPU work submissions, and record the elapsed time (in milliseconds)
on GPU:

.. code-block:: python

# To create events and record the timing:
s = Device(0).create_stream()
e1 = s.record(options={"enable_timing": True})
# ... run some GPU works ...
e2 = s.record(options={"enable_timing": True})
e2.sync()
print(f"time = {e2 - e1} milliseconds")

# Or, if events are already created:
s.record(e1)
# ... run some more GPU works ...
s.record(e2)
e2.sync()
print(f"time = {e2 - e1} milliseconds")

Directly creating an :obj:`~_event.Event` is not supported due to ambiguity,
and they should instead be created through a :obj:`~_stream.Stream` object.
Expand Down Expand Up @@ -96,6 +114,17 @@ def close(self):
"""Destroy the event."""
self._mnff.close()

def __isub__(self, other):
return NotImplemented

def __rsub__(self, other):
return NotImplemented

def __sub__(self, other):
# return self - other (in milliseconds)
timing = handle_return(driver.cuEventElapsedTime(other.handle, self.handle))
return timing

@property
def is_timing_disabled(self) -> bool:
"""Return True if the event does not record timing data, otherwise False."""
Expand Down
1 change: 1 addition & 0 deletions cuda_core/docs/source/release/0.2.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ New features
- Expose :class:`ObjectCode` as a public API, which allows loading cubins from memory or disk. For loading other kinds of code types, please continue using :class:`Program`.
- A C++ helper function ``get_cuda_native_handle()`` is provided in the new ``include/utility.cuh`` header to retrive the underlying CUDA C objects (ex: ``CUstream``) from a Python object returned by the ``.handle`` attribute (ex: :attr:`Stream.handle`).
- For objects such as :class:`Program` and :class:`Linker` that could dispatch to different backends, a new ``.backend`` attribute is provided to query this information.
- Support CUDA event timing.

Limitations
-----------
Expand Down
20 changes: 18 additions & 2 deletions cuda_core/tests/test_event.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,33 @@
# this software and related documentation outside the terms of the EULA
# is strictly prohibited.

import time

import pytest

from cuda.core.experimental import Device, EventOptions
from cuda.core.experimental._utils import CUDAError


@pytest.mark.parametrize("enable_timing", [True, False, None])
def test_timing(init_cuda, enable_timing):
options = EventOptions(enable_timing=enable_timing)
stream = Device().create_stream()
event = stream.record(options=options)
assert event.is_timing_disabled == (not enable_timing if enable_timing is not None else True)
delay_seconds = 0.5
e1 = stream.record(options=options)
time.sleep(delay_seconds)
e2 = stream.record(options=options)
e2.sync()
for e in (e1, e2):
assert e.is_timing_disabled == (True if enable_timing is None else not enable_timing)
if enable_timing:
elapsed_time_ms = e2 - e1
assert isinstance(elapsed_time_ms, float)
assert delay_seconds * 1000 <= elapsed_time_ms < delay_seconds * 1000 + 2 # tolerance 2 ms
else:
with pytest.raises(CUDAError) as e:
elapsed_time_ms = e2 - e1
assert "CUDA_ERROR_INVALID_HANDLE" in str(e)


def test_is_sync_busy_waited(init_cuda):
Expand Down