Skip to content

Client does not handle connections being closed by the server in all cases #2605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JoshMock opened this issue Feb 3, 2025 · 5 comments · Fixed by elastic/elastic-transport-js#223

Comments

@JoshMock
Copy link
Member

JoshMock commented Feb 3, 2025

When sending a bulk request that is larger than Elasticsearch is configured to accept, it will respond with a 413 Content Too Large and close the connection. When the client uses HttpConnection as its connection type and ClusterConnectionPool as its pool type, it does not appropriately handle when the connection is closed by the server. That socket will stay in the pool and may be used by a subsequent request, raising a read ECONNRESET or EPIPE exception when trying to send a request over a closed connection.

It has not yet been verified whether the same happens when using UndiciConnection or other pool types, so all combinations may need to be tested.

From @rudolf while reporting on this issue:

I suspect we at some point try to close the already closed socket.

@JoshMock
Copy link
Member Author

JoshMock commented Feb 3, 2025

If you look at the HttpConnection class, it handles several events that may be raised by the request, response and socket objects during the lifecycle of an HTTP request.

My concern is that ECONNRESET is only handled during an aborted request, but not other types of requests, and EPIPE is not explicitly handled anywhere. My initial suspicion is that ensuring the request and response are explicitly destroyed when these errors are raised will help to solve this issue.

I'm working on writing unit tests to reproduce the current failure case, and then will work on implementing a fix.

@JoshMock
Copy link
Member Author

I've been working on this for a few days now and haven't fully narrowed in on an exact fix yet, unfortunately. The issue turns out to be very hard to pin down, in large part because the Node http library is entirely event-driven, and the order of operations for a particular request is not simple to express in mostly flat, imperative steps.

So, I've been doing some refactoring work on HttpConnection that I'd been putting off for a while, adding a state machine that tracks the request/response lifecycle. It's already highlighted a few edge cases where event listeners appear to not be firing in an order we would expect, which is exactly the situation we're dealing with here. That refactor is going relatively well, and most of the unit test suite is in a passing state, but I am moving carefully to avoid regressions. The unit test I added for this particular bug is one of the only ones still not passing, but I've been narrowing in on it as the refactor falls into place.

Thanks for your patience, to anyone following along. I know this issue has been a blocker to some downstream work in Serverless, so I'm working as quickly and carefully as I can.

@JoshMock JoshMock mentioned this issue Feb 19, 2025
8 tasks
@JoshMock
Copy link
Member Author

Update: still working on this. Getting closer, but it's been very hard to unravel the order of events without breaking existing functionality. 😓

@JoshMock
Copy link
Member Author

Just merged a fix in elastic/elastic-transport-js#223. Will publish 8.9.5-beta.1 shortly so that @rudolf and others can test it first before we roll it out to the general public.

@JoshMock
Copy link
Member Author

transport 8.9.5-beta.1 is now available on npm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant