-
Notifications
You must be signed in to change notification settings - Fork 264
BulkIngester retry policies #478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would just like to point out this issue where adding such behavior to the low level rest client was discussed: elastic/elasticsearch#21141 (comment)
That is to say, please get some consensus on where this would belong, because right now it seems like every application developer has to roll their own solution. |
@swallez Do you by chance have any update to this ticket? It's a year old, and retry policies would be a good idea. This work seems to be stalled. Thanks for looking! |
Hi @swallez, hope this message finds you well. best regards |
hi @swallez , I totally agree the BulkProcessor in HLRC was retrying even in cases this should not have done. What would happen in cases when there is a temporary network issue? I guess the low level client would close the connection. cc @l-trotta |
implemented in #930, for now only for 429 errors |
@l-trotta is there a separate issue to support retries on the low level client? |
@fabriziofortino we're working on it, not on the low level client, but on the transport layer of the java client, this is the draft PR: #954 |
The
BulkProcessor
in the High Level Rest Client (HLRC) has two kinds of retries:The new
BulkIngester
added in #474 doesn't retry for now:for the 429 handling, we can argue that this belongs to the transport layer (low level rest client), that already retries on all cluster nodes in case of failure and should also handle 429 responses.
for individual item retries, the approach used in the
BulkProcessor
to retry all failed items has some shortcomings: a number of errors will result in the same error when retried: e.g. version verification failure, partial update failure because of script error or bad document structure, deletion of a non-existing document, etc.The items worth retrying are probably those with a 429 status, which may happen if the coordinating node accepted the request but the target node for the item's operation was overloaded.
A way to handle this in the new
BulkIngester
would be to define a retry policy by means of delay behavior (linear, exponential, etc) like in HLRC and also a predicate to select the failed items that need to be retried.The text was updated successfully, but these errors were encountered: