-
Notifications
You must be signed in to change notification settings - Fork 264
Is there a replacement for the BulkProcessor? #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There's no equivalent for now, but we're working on a replacement that will land in the near future. |
Is there any update on this feature? |
Any update on BulkProcessors for the new ES java api? |
if there is no equivalent for BulkProcessor in the new elasticsearch java api client, how will we migrate a BulkProcessor code? Is there a sample just to point to the right direction? |
any update would be much appreciated @swallez |
Sorry, no update yet. I'll raise this internally as a topic that needs to be prioritized. |
Hi folks - thanks for the already open issue. We've recently migrated from the now deprecated ES7 client to the new API Client in v8 and "stumbled" the very hard way on bulk requests. Instead of having the "old" way (as well as the way of the clients in, e.g., Python), where the ES client is splitting the bulk request in smaller chunks based on (byte)size or document count, we now have to reside to manually splitting the data ourselves. Seems a step backward to me. Is there a way to speed up the development of a "new" BulkProcessor, aside from "implement it yourselves"? ("buy Enterprise support", "send us chocolate", "talk to and bribe a member of the Evangelist team", ...) Thanks! :) Best, |
Do we have any deadline on update of this issue ? |
Any reason why this is taking so long? Why even introduce a new API without this core component for ingestion? This issue should be at the top of the list of things to solve in my opinion. An update and preferably timeline would be greatly appreciated. Thanks and best regards, |
It seems that I just opened a duplicate to that here: #425 It hard to beliefe that you should migrate to the new client before switching from es 7 to es 8, when there is no such feature present. |
Well I started migrating to the new java api client and underway realised the bulkprocessor was not available there. It has been almost a year now and still nothing seems to be moving on the Elastic front. I guess only paying customers get support these days... |
I don't understand that this issue is given zero priority. Why introduce a new API if that is not usable in real life situations for bulk processing? |
I stumbled upon this issue yesterday, something like that seems to be working (but I have yet to do extensive testing): In classpath:
Then in my indexing logic that relies on if (esVersion.getMajor() >= 8) {
requestOptions = RequestOptions.DEFAULT
.toBuilder()
.addHeader("Content-Type", "application/vnd.elasticsearch+json; compatible-with=7")
.addHeader("Accept", "application/vnd.elasticsearch+json; compatible-with=7")
.build();
} else {
requestOptions = RequestOptions.DEFAULT;
} That I'm using it like this: Request request = new Request(HttpPost.METHOD_NAME, "/_bulk");
request.setOptions(requestOptions); With that I do not have NPE anymore and the indexing seems to work properly on es6/es7/es8. Hopefully I'll not have classpath issues due to having 2 versions of the java client. |
Hi Anthony, It's not that there is no bulk processor functionality it's the data chunking and the listener that are missing.
Below is some code where I actually use the new bulk processor. The events used in the code are documents that need to be stored, updated,... I keep track of the ElasticSearch index in the event itself because updates could go horribly wrong otherwise when there is an index rollover (not everyone purely uses ElasticSearch for a logging use case:). We use ElasticSearch also as our database for master data unlike most other users. I've been using ElasticSearch since version 0.1.4... Best regards,
|
@frank-montyne In this case I cannot help you. I still have the listener because I'm still relying on the What you are using here is just the Bulk API of ES and it's far (like very far) from what With
You can then "stream" as many documents as you want through it and it will:
|
Exactly my point.
…On Thu, Nov 17, 2022 at 11:42 AM Anthony Pessy ***@***.***> wrote:
@frank-montyne <https://github.com/frank-montyne> In this case I cannot
help you.
I still have the listener because I'm still relying on the 7.X client
with ES8 in compatibility mode.
What you are using here is just the Bulk API of ES and it's far (like very
far) from what BulkProcessor actually does.
With BulkProcessor you can configure:
- the concurrency
- the max size of bulk requests (by document count and/or
- by size)
- the back-off policy
You can then "stream" as many documents as you want through it and it will:
- batch documents in bulk using configured thresholds
- handles N requests in parallel (N being the concurrency given)
- the request being the call .bulk on the ES client - what you are
doing in your snippet
- handles retries of failed document within that bulk request
following the back-off policy
—
Reply to this email directly, view it on GitHub
<#108 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJKQL5QMXD4B2YGTCZOAL3WIYDZ3ANCNFSM5L5NIE3Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Is adding a new Bulk Processor in 8+ java client jars being worked on? My application relies on the BulkProcessor's functionalities and trying to code my own Bulk Processor seems like a big risk not worth taking. I would rather wait for ElasticSearch dev team to come up with a new one. For now I will continue to use 7.17.3 jars |
@swallez It does not look like the new BulkIngester retry a failed operation like BulkProcessor did, am I reading it correctly? It does not seems to look at the actual response at all. Previously the BulkProcessor would create a new BulkRequest with only the failed operations for the next batch. |
@panthony the new I've opened #478 to outline the issues and a way to implement this. Please continue the discussion on retries there. |
@swallez The BulkIngester helper doesn't seem to be present in the 7.17.8 release. Is that correct? If so to which 7.17.x release will it be added? Thanks |
@frank-montyne that's correct. It will be included in 7.17.9 which should be released at the end of this month, and in 8.7.0 that is currently planned somewhere in March. |
ThanksOn 16 Jan 2023, at 18:33, Sylvain Wallez ***@***.***> wrote:
@frank-montyne that's correct. It will be included in 7.17.9 which should be released at the end of this month, and in 8.7.0 that is currently planned somewhere in March.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I don't know if it can help someone but I created a class to mimic bulkprocessor. Function sending the data to an insertion thread:
Class managing insertions:
|
@nicolasm35 as mentioned previously, an implementation will be part of the next release. See PR #474 and https://github.com/elastic/elasticsearch-java/tree/main/java-client/src/main/java/co/elastic/clients/elasticsearch/_helpers/bulk If you can't wait for the next release, I suggest you copy that code instead of using this more limited implementation. |
With RHLC, one could use the BulkProcessor API to batch IndexRequests and DeleteRequests. Is there a recommended replacement for BulkProcessor when migrating to elasticsearch-java?
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-docs-bulk-processor.html
The text was updated successfully, but these errors were encountered: