Skip to content

Make error messages more clear and actionable #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leofang opened this issue Oct 30, 2024 · 2 comments · Fixed by #458 or #503
Closed

Make error messages more clear and actionable #198

leofang opened this issue Oct 30, 2024 · 2 comments · Fixed by #458 or #503
Assignees
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do triage Needs the team's attention

Comments

@leofang
Copy link
Member

leofang commented Oct 30, 2024

In the current prototype we often just raise the exception without any error message. It was meant to be only as a placeholder that we need to improve/iterate quickly. Error messages should be short but clear, informing users what's wrong (and potential fixes if possible).

@leofang leofang added cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do! labels Oct 30, 2024
@leofang leofang added this to the cuda.core beta 2 milestone Oct 30, 2024
@leofang leofang added P1 Medium priority - Should do triage Needs the team's attention and removed P0 High priority - Must do! labels Nov 14, 2024
@leofang
Copy link
Member Author

leofang commented Jan 16, 2025

Ralf to oversee the clear error msg process, need to know bw constraint by beta 3

@ksimpson-work
Copy link
Contributor

Context:

cuda.core obviously wraps the bindings which are generated via either the cuda-python binding generator, or using cybind. Currently nvjitlink is the only library which has bindings generated by cybind, the rest of the bindings (cuda driver, cuda runtime, nvrtc, culink etc.) are all from the cuda-python binding generator. I will refer to these as cuda-python-bindings

The relevance of this is that there is a difference in how each of these bindings handle errors:

nvjitlink + other cybind bindings which come in the future -> raise exceptions at the cython layer
all the rest (cuda-python-bindings) -> return a tuple (error_code, result), no exception raised on error

We have _utils.handle_return() which we use to parse the error_code from any call to the cuda-python-bindings. When it is not success we raise a generic exception at the cuda.core layer

In order to improve the error message from certain types of errors (specific error codes, or specific library calls) we implement a context manager to manage exceptions in _linker.py. In this case we know that it is likely helpful to provide the error log from the linker along with the original exception.

My idea to improve the error messages module wide, would be to give every class an implementation of that contextmanager (either re write, or pass a callback to some shared context manager in utils). This exception manager can switch case on the error code pertaining to any raised exception, and provide a better message. It would have the ability to fallback to the generic message if there was nothing meaningful to add, or if we wanted to go lightweight at first. This way we can handle the exceptions raised by nvjitlink, as well as the exceptions raised by _utils.handle_return across all files the same way. This gives you the freedom to design something slick and maintainable, so that it will be easy to add a custom handler for a specific call or exception type in the future (since we don't need to handle every single special case in this first review).

Long story short:

look at _linker.py and port the _exception_manager over to all the classes so that they can all easily customize and standardize exception handling. OR come at it from a different angle entirely.

Let me know if you have any questions. The changes I had were actually not in an isolated branch, and to clean them up would take awhile. Happy to do that if it would be helpful but let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do triage Needs the team's attention
Projects
None yet
3 participants