Make error messages more clear and actionable #198

leofang · 2024-10-30T16:27:54Z

In the current prototype we often just raise the exception without any error message. It was meant to be only as a placeholder that we need to improve/iterate quickly. Error messages should be short but clear, informing users what's wrong (and potential fixes if possible).

leofang · 2025-01-16T20:43:20Z

Ralf to oversee the clear error msg process, need to know bw constraint by beta 3

ksimpson-work · 2025-01-16T21:31:09Z

Context:

cuda.core obviously wraps the bindings which are generated via either the cuda-python binding generator, or using cybind. Currently nvjitlink is the only library which has bindings generated by cybind, the rest of the bindings (cuda driver, cuda runtime, nvrtc, culink etc.) are all from the cuda-python binding generator. I will refer to these as cuda-python-bindings

The relevance of this is that there is a difference in how each of these bindings handle errors:

nvjitlink + other cybind bindings which come in the future -> raise exceptions at the cython layer
all the rest (cuda-python-bindings) -> return a tuple (error_code, result), no exception raised on error

We have _utils.handle_return() which we use to parse the error_code from any call to the cuda-python-bindings. When it is not success we raise a generic exception at the cuda.core layer

In order to improve the error message from certain types of errors (specific error codes, or specific library calls) we implement a context manager to manage exceptions in _linker.py. In this case we know that it is likely helpful to provide the error log from the linker along with the original exception.

My idea to improve the error messages module wide, would be to give every class an implementation of that contextmanager (either re write, or pass a callback to some shared context manager in utils). This exception manager can switch case on the error code pertaining to any raised exception, and provide a better message. It would have the ability to fallback to the generic message if there was nothing meaningful to add, or if we wanted to go lightweight at first. This way we can handle the exceptions raised by nvjitlink, as well as the exceptions raised by _utils.handle_return across all files the same way. This gives you the freedom to design something slick and maintainable, so that it will be easy to add a custom handler for a specific call or exception type in the future (since we don't need to handle every single special case in this first review).

Long story short:

look at _linker.py and port the _exception_manager over to all the classes so that they can all easily customize and standardize exception handling. OR come at it from a different angle entirely.

Let me know if you have any questions. The changes I had were actually not in an isolated branch, and to clean them up would take awhile. Happy to do that if it would be helpful but let me know.

leofang added cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do! labels Oct 30, 2024

leofang added this to the cuda.core beta 2 milestone Oct 30, 2024

leofang added P1 Medium priority - Should do triage Needs the team's attention and removed P0 High priority - Must do! labels Nov 14, 2024

leofang modified the milestones: cuda.core beta 2, cuda.core beta 3 Dec 4, 2024

leofang assigned rwgk Jan 16, 2025

rwgk mentioned this issue Feb 21, 2025

Clearer error messages (cuda.core) #458

Merged

rwgk mentioned this issue Mar 3, 2025

Apply __new__ approach to disabling __init__ #484

Merged

leofang linked a pull request Mar 6, 2025 that will close this issue

Clearer error messages (cuda.core) #458

Merged

rwgk closed this as completed in #458 Mar 6, 2025

rwgk mentioned this issue Mar 8, 2025

cuda.core: CUResult, cudaError explanations #503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make error messages more clear and actionable #198

Make error messages more clear and actionable #198

leofang commented Oct 30, 2024

leofang commented Jan 16, 2025

Uh oh!

ksimpson-work commented Jan 16, 2025

Uh oh!

Make error messages more clear and actionable #198

Make error messages more clear and actionable #198

Comments

leofang commented Oct 30, 2024

leofang commented Jan 16, 2025

Uh oh!

ksimpson-work commented Jan 16, 2025

Uh oh!