-
Notifications
You must be signed in to change notification settings - Fork 378
Missing error code (for volume lifecycle, plugin-specific behavior, etc.). #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Don't we already have a general/generic error code that implementations can use for vendor-specific codes? ListVolumes: are there specific errors that you'd like to see? how would the recovery semantics differ across such errors? |
For vendor-specific error handling: the closest error code we have is For ListVolumes, maybe we can have an |
Specific error codes are only useful if there are meaningful recovery semantics. Do you have any recommendations for recovery semantics for additional error codes that you'd find useful? |
How come we aren't using grpc error codes? |
gRPC error codes are intended for gRPC-protocol-level errors. for CSI ops
they're too low level: they don't allow us to encapsulate additional
information, and don't allow us to accurately model/advertise the recovery
semantics that we want clients to use.
…On Fri, Jun 2, 2017 at 8:46 AM, Brian Goff ***@***.***> wrote:
How come we aren't using grpc error codes?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPVLNMxsNONMNecfSz5xkYY2ylZNNOTks5sAARBgaJpZM4NoMcz>
.
|
grpc error codes are for standard ways to communicate classes of errors across the RPC boundary, including errors that are temporary, requests that can be retried vs should not be retried, etc. |
/cc @saad-ali |
FYI we can use |
Sorry for butting in here, but this is a bit of a concern and I thought I could help get to the right solution for error handling. Looking at the proposed API in https://github.com/container-storage-interface/spec/blob/master/spec.md, much of what GRPC provides is being reproduced by this protocol design. For example, the wire format details the algebraic data type of reply/error already and doesn't need to be represented in the higher-level protocol. The main concern in not using the GRPC errors to signify error conditions is that we force users of this API to correctly map API error semantics into the language semantics. This is often error prone when done manually and is already provided by GRPC generators. Defining an alternative error path will lead to missed errors and bugs.
This is simply not true. Take a peak at wire format for details, but protocol errors are actually mapped to HTTP2 codes, which becomes Let's take the I usually look to google apis for best practices. They use this error code space at the application level to great effect. I've found that applying those concepts broadly (containerd and swarmkit, among others) has also been quite successful. Their methodology for versioning APIs is also very clean and usable in practice, so I'd implore you to compare the approach. Again, I am sorry if this is presumptuous, but I think this project will have a large impact and having a pleasant API experience is a huge part of that. If this is not the right issue to discuss this, let me know I can make another issue to highlight these points. TL; DR
|
Thanks for the great input here. The reason for having our own error handling semantics is that it becomes very easy for the storage plugin to return very specific, actionable error codes to the CO. Using the a bounded set of generic, pre-defined gRPC error codes makes it harder to understand what exactly the problem is and what the expected recovery behavior for the CO should be. Looking through https://github.com/grpc/grpc/blob/master/src/proto/grpc/status/status.proto it appears that So I'll take another look and see if we can leverage gRPC error codes instead of redefining our own error handling semantics. |
@saad-ali I understand the constraints here and I've actually been down this path myself. In practice, however, we found that using the GRPC error codes was more than sufficient. Coupling the error codes with solid messaging and extending with detail should be enough. |
@stevvooe thanks for the feedback. IIUC, the suggestion was to use GRPC canonical error code [1] [2] for some some of the common errors. For instance, If we cannot find a mapping to the canonical errors, we'll use The question is: how does the client distinguish two "internal" errors programatically. Are we going to encode an CSI specific error code in the Another route is: we take a step back and really see if it's necessary to introduce CSI specific errors that maps to the "internal" error. Maybe that's GRPC canonical errors are sufficient for most use cases (as @stevvooe suggested). |
Everything should map well, and for things that are truly "internal" it's some error that you can report the details of, but there wouldn't be really anything to do with the error than report. |
@cpuguy83 If that's the case, I like that! I think the error code part of the spec can then be re-structured in the following way:
|
@jieyu I think this is perfect, and similar to how linux kernel errors are documented. |
resolved by #115, please re-open if you disagree |
Should we add new error codes to indicate a plugin-specific error for each RPC call?
DeleteVolume: When a volume is in the NODE_READY or PUBLISHED state, what error code should we return?
ControllerUnpublishVolume: When a volume is in the PUBLISHED state, what error code should we return?
ListVolumes: Should we add a
ListVolumesError
?Misc:
What is
UNSUPPORTED_VOLUME_TYPE
for?GetNodeIDError
andProbeNodeError
seems redundant.The text was updated successfully, but these errors were encountered: