-
Notifications
You must be signed in to change notification settings - Fork 920
allow retry on netErrors in safe situations #871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced this makes something safe to retry. If you send something to a server and get a network error, it is still possible for the server to have received and processed the result, but the error happens during return to the client. In that case it's not safe to retry. Do I understand the circumstances where this can happen correctly? If so, this is closer to a "maybe it worked?" error, which means the client has to check if it is safe to retry or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review, @mjibson and @cbandy!
My networking knowledge is a bit rusty but from my understanding this can not happen:
When
send()
is called the data is only written to the kernel's send buffer and then the call returns. Thesend()
call does not block until the peer acknowledged receiving the data.Therefore it can not happen that
cn.send(m)
returns an error because something went wrong during returning an ACK to the sender.Also errors that are returned by
send()
only indicate local errors.Quote from Linux manpage of the
send()
syscall:http://man7.org/linux/man-pages/man2/send.2.html
(Golang uses internally the
write()
syscall to write the data to the fd of the socket. It's the same then using thesend()
syscall without additional flags.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a lot of assumptions that care about how the linux kernel and go function and I'm not at all comfortable changing this until there are tests that clearly demonstrate these assumptions. Also, what about if the client and or server are on windows, mac, or some version of linux or go that doesn't happen to function this way? I'm still not convinced that it's always safe to retry if there's an error here. Distributed systems indeed have a "maybe this failed?" error because of this exact problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjibson ok, I can make the code more safe by narrowing the situation when a
send()
error is identified as retryable by:Conn.Write()
returned that 0 bytes were transferred when it returned a net.OpError error.syscall.EPIPE
: "The local end has been shut down on a connection oriented socket.".syscall.ENOTSOCK
: "The file descriptor sockfd does not refer to a socket.".syscall.EINVAL
: "Invalid argument passed."syscall.ECONNRESET
: "Connection reset by peer."syscall.EBADF
: "sockfd is not a valid open file descriptor."syscall.EPIPE
only. The PR would then address specifically the problem described in db operation fails with "broken pipe" instead of reconnecting transparently after server restart #870.What do you think?
I would favor doing 1 and 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know what exactly the EPIPE error means actually happened and what didn't happen. Can it change between OS or version of linux? Is the bytes written count always non-zero if any bytes were sent even if there's an error return for all paths? These questions make me very wary to do even 1 and 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look it up in the documentation for the related syscalls:
Posix: https://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
The documentation in the e.g. linux or freebsd manpages are more clear and they implement the Posix standard, see e.g.:
https://www.freebsd.org/cgi/man.cgi?write(2) or http://man7.org/linux/man-pages/man2/write.2.html
See for Windows: https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-writefile
Google can also help with understanding what EPIPE means:
https://www.google.com/search?q=what+does+%22broken+pipe%22+mean&oq=what+does+%22broken+pipe%22+mean
The syscall API of operating systems is stable. It does not change in downwards incompatible manner. This would break tons of applications for the effected operation system.
Unix-like operating systems implement the POSIX API which specify this behavior. This includes OSX.
Windows Winsock API is not POSIX compliant.
But in this case it behaves the same, the
WriteFile()
returnsERROR_BROKEN_PIPE
when the filedescriptor is closed.ERROR_BROKEN_PIPE
is mapped to the same error thenEPIPE
by golangSee:
https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-writefile
https://golang.org/src/syscall/zerrors_windows.go#L120
Yes
See the implementation of Golangs
conn.Write()
and the parent methods that it calls.https://golang.org/src/net/net.go?s=6460:6503#L175
Unix: https://golang.org/src/internal/poll/fd_unix.go#L254
Windows: https://golang.org/src/internal/poll/fd_windows.go#657
fd.Write()
callssyscall.Write()
in a loop until all bytes were written out.If it fails in the first loop iteration, it only returns an error nothing was sent out in this case.
If it fails on subsequent loop iterations it returns with the error how many bytes were send out.