-
Notifications
You must be signed in to change notification settings - Fork 904
GODRIVER-2037 Don't clear the connection pool on client-side connect timeout errors. #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b84c70d
to
859168b
Compare
859168b
to
7474d05
Compare
7474d05
to
8fa7fcd
Compare
…rs when using operation-scoped timeouts.
8fa7fcd
to
d5619e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
…h non-timeout dial error.
37854e1
to
c573b2b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just some minor comments about testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! The new operationTimeout
parameter for tests is much clearer IMO. I responded to a previous thread about testing context.Canceled
errors so I'll hold off on approving until that's resolved, but everything else pretty much LGTM.
…ake error handling, improve SDAM error handling tests, add handshake cancellation error test.
ba2db2f
to
b067477
Compare
@kevinAlbs and @divjotarora I've made 2 somewhat significant changes since the last time you reviewed this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for investigating this issue and all of the approaches so thoroughly!
} | ||
} | ||
|
||
func TestServerConnectionCancellation(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: For readability, can you write a comment at the beginning of this test that explains what we're testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod optional comments.
@@ -23,7 +24,65 @@ const ( | |||
errorInterruptedAtShutdown int32 = 11600 | |||
) | |||
|
|||
// testPoolMonitor exposes an *event.PoolMonitor and collects all events logged to that | |||
// *event.PoolMonitor. It is safe to use from multiple concurrent goroutines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh very nice. Optional: should this be in a separate file, like connection_pool_helpers_test.go
? Though isPoolCleared
was used by other files, the name of this file suggests it is only for the tests specified in https://github.com/mongodb/specifications/blob/master/source/connections-survive-step-down/tests/README.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to locate it "next to" the existing poolMonitor
so that there aren't two implementations of the same thing in different places. I think the testPoolMonitor
could actually be a good candidate for a "test utilities" package because numerous test packages need to record events (including the Server
tests added in this PR). However, I'm not sure it's worth moving into a separate file until we want to move it into a test utilities package, which probably shouldn't be part of this PR.
I've added investigating moving testPoolMonitor
to a shared test utilities package to the description of GODRIVER-2068..
return true | ||
} | ||
|
||
// In some networking functions, the deadline from the context is used to determine timeouts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the helpful explanation! That is interesting, and sounds like it could be possible buggy behavior in the net package if it inconsistently sets context errors.
// Create a connection pool event monitor that sends all events to an events channel | ||
// so we can assert on the connection pool events later. | ||
WithConnectionPoolMonitor(func(_ *event.PoolMonitor) *event.PoolMonitor { | ||
return &event.PoolMonitor{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: IIUC this could use the new testPoolMonitor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could, although it would require moving testPoolMonitor
into a test util package like internal/testutil
(or even an external test utilities package if that makes sense). I'll leave a comment to indicate that change should be considered as part of GODRIVER-2068.
Update
topology.Server#ProcessHandshakeError
to not clear the server connection pool on client-side timeout errors that occur while creating a connection in-line with an operation. This is intended as a patch to prevent clearing the connection pool on specific identified conditions that currently cause the driver to clear the server connection pool unnecessarily and is not intended to handle a comprehensive set of errors that could occur during handshake.Note that the changes here are likely irrelevant once all connections are established in the background with the
connectTimeoutMS
deadline (GODRIVER-2038).Possible cases:
server.Connection(context.Background())
Get a connection to
server
usingconnectTimeoutMS
. Theserver
connection pool will be cleared on timeout during handshake. This case is used for connections created by theMinPoolSize
maintenance goroutine and by operations that usecontext.Background()
for the timeout.server.Connection(context.WithTimeout(..., 1*time.Second))
Get a connection to
server
using a timeout specified on an operation lower thanconnectTimeoutMS
. Theserver
connection pool will not be cleared on timeout during handshake. This case is used for connections created by operations that usecontext.WithTimeout()
for timeouts shorter than theconnectTimeoutMS
.server.Connection(context.WithTimeout(..., 10*time.Minute))
Get a connection to
server
using a timeout specified on an operation higher thanconnectTimeoutMS
. Theserver
connection pool will be cleared on timeout during handshake. This case is used for connections created by operations that usecontext.WithTimeout()
for timeouts longer thanconnectTimeoutMS
.server.Connection(context.WithCancel(...))
Get a connection to
server
using a cancellation specified on an operation. Theserver
connection pool will not be cleared on timeout during handshake. This case is used for connections created by operations that usecontext.WithCancel()
for cancellation.GODRIVER-2037