Skip to content

Commit c4c94ce

Browse files
Add bulk, count, clear scroll, close PIT examples (#3510) (#3534)
(cherry picked from commit 4eeb458) Co-authored-by: Lisa Cawley <[email protected]>
1 parent e59b440 commit c4c94ce

25 files changed

+687
-271
lines changed

output/openapi/elasticsearch-openapi.json

Lines changed: 64 additions & 50 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/openapi/elasticsearch-serverless-openapi.json

Lines changed: 64 additions & 50 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 129 additions & 103 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_global/bulk/BulkRequest.ts

Lines changed: 130 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,115 @@ import { OperationContainer, UpdateAction } from './types'
3131

3232
/**
3333
* Bulk index or delete documents.
34-
* Performs multiple indexing or delete operations in a single API call.
34+
* Perform multiple `index`, `create`, `delete`, and `update` actions in a single request.
3535
* This reduces overhead and can greatly increase indexing speed.
36+
*
37+
* If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
38+
*
39+
* * To use the `create` action, you must have the `create_doc`, `create`, `index`, or `write` index privilege. Data streams support only the `create` action.
40+
* * To use the `index` action, you must have the `create`, `index`, or `write` index privilege.
41+
* * To use the `delete` action, you must have the `delete` or `write` index privilege.
42+
* * To use the `update` action, you must have the `index` or `write` index privilege.
43+
* * To automatically create a data stream or index with a bulk API request, you must have the `auto_configure`, `create_index`, or `manage` index privilege.
44+
* * To make the result of a bulk operation visible to search using the `refresh` parameter, you must have the `maintenance` or `manage` index privilege.
45+
*
46+
* Automatic data stream creation requires a matching index template with data stream enabled.
47+
*
48+
* The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
49+
*
50+
* ```
51+
* action_and_meta_data\n
52+
* optional_source\n
53+
* action_and_meta_data\n
54+
* optional_source\n
55+
* ....
56+
* action_and_meta_data\n
57+
* optional_source\n
58+
* ```
59+
*
60+
* The `index` and `create` actions expect a source on the next line and have the same semantics as the `op_type` parameter in the standard index API.
61+
* A `create` action fails if a document with the same ID already exists in the target
62+
* An `index` action adds or replaces a document as necessary.
63+
*
64+
* NOTE: Data streams support only the `create` action.
65+
* To update or delete a document in a data stream, you must target the backing index containing the document.
66+
*
67+
* An `update` action expects that the partial doc, upsert, and script and its options are specified on the next line.
68+
*
69+
* A `delete` action does not expect a source on the next line and has the same semantics as the standard delete API.
70+
*
71+
* NOTE: The final line of data must end with a newline character (`\n`).
72+
* Each newline character may be preceded by a carriage return (`\r`).
73+
* When sending NDJSON data to the `_bulk` endpoint, use a `Content-Type` header of `application/json` or `application/x-ndjson`.
74+
* Because this format uses literal newline characters (`\n`) as delimiters, make sure that the JSON actions and sources are not pretty printed.
75+
*
76+
* If you provide a target in the request path, it is used for any actions that don't explicitly specify an `_index` argument.
77+
*
78+
* A note on the format: the idea here is to make processing as fast as possible.
79+
* As some of the actions are redirected to other shards on other nodes, only `action_meta_data` is parsed on the receiving node side.
80+
*
81+
* Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
82+
*
83+
* There is no "correct" number of actions to perform in a single bulk request.
84+
* Experiment with different settings to find the optimal size for your particular workload.
85+
* Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size.
86+
* It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch.
87+
* For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
88+
*
89+
* **Client suppport for bulk requests**
90+
*
91+
* Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
92+
*
93+
* * Go: Check out `esutil.BulkIndexer`
94+
* * Perl: Check out `Search::Elasticsearch::Client::5_0::Bulk` and `Search::Elasticsearch::Client::5_0::Scroll`
95+
* * Python: Check out `elasticsearch.helpers.*`
96+
* * JavaScript: Check out `client.helpers.*`
97+
* * .NET: Check out `BulkAllObservable`
98+
* * PHP: Check out bulk indexing.
99+
*
100+
* **Submitting bulk requests with cURL**
101+
*
102+
* If you're providing text file input to `curl`, you must use the `--data-binary` flag instead of plain `-d`.
103+
* The latter doesn't preserve newlines. For example:
104+
*
105+
* ```
106+
* $ cat requests
107+
* { "index" : { "_index" : "test", "_id" : "1" } }
108+
* { "field1" : "value1" }
109+
* $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
110+
* {"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
111+
* ```
112+
*
113+
* **Optimistic concurrency control**
114+
*
115+
* Each `index` and `delete` action within a bulk API call may include the `if_seq_no` and `if_primary_term` parameters in their respective action and meta data lines.
116+
* The `if_seq_no` and `if_primary_term` parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.
117+
*
118+
* **Versioning**
119+
*
120+
* Each bulk item can include the version value using the `version` field.
121+
* It automatically follows the behavior of the index or delete operation based on the `_version` mapping.
122+
* It also support the `version_type`.
123+
*
124+
* **Routing**
125+
*
126+
* Each bulk item can include the routing value using the `routing` field.
127+
* It automatically follows the behavior of the index or delete operation based on the `_routing` mapping.
128+
*
129+
* NOTE: Data streams do not support custom routing unless they were created with the `allow_custom_routing` setting enabled in the template.
130+
*
131+
* **Wait for active shards**
132+
*
133+
* When making bulk calls, you can set the `wait_for_active_shards` parameter to require a minimum number of shard copies to be active before starting to process the bulk request.
134+
*
135+
* **Refresh**
136+
*
137+
* Control when the changes made by this request are visible to search.
138+
*
139+
* NOTE: Only the shards that receive the bulk request will be affected by refresh.
140+
* Imagine a `_bulk?refresh=wait_for` request with three documents in it that happen to be routed to different shards in an index with five shards.
141+
* The request will only wait for those three shards to refresh.
142+
* The other two shards that make up the index do not participate in the `_bulk` request at all.
36143
* @rest_spec_name bulk
37144
* @availability stack stability=stable
38145
* @availability serverless stability=stable visibility=public
@@ -53,62 +160,72 @@ export interface Request<TDocument, TPartialDocument> extends RequestBase {
53160
]
54161
path_parts: {
55162
/**
56-
* Name of the data stream, index, or index alias to perform bulk actions on.
163+
* The name of the data stream, index, or index alias to perform bulk actions on.
57164
*/
58165
index?: IndexName
59166
}
60167
query_parameters: {
61168
/**
62-
* If `true`, the response will include the ingest pipelines that were executed for each index or create.
169+
* If `true`, the response will include the ingest pipelines that were run for each index or create.
63170
* @server_default false
64171
*/
65172
list_executed_pipelines?: boolean
66173
/**
67-
* ID of the pipeline to use to preprocess incoming documents.
68-
* If the index has a default ingest pipeline specified, then setting the value to `_none` disables the default ingest pipeline for this request.
69-
* If a final pipeline is configured it will always run, regardless of the value of this parameter.
174+
* The pipeline identifier to use to preprocess incoming documents.
175+
* If the index has a default ingest pipeline specified, setting the value to `_none` turns off the default ingest pipeline for this request.
176+
* If a final pipeline is configured, it will always run regardless of the value of this parameter.
70177
*/
71178
pipeline?: string
72179
/**
73-
* If `true`, Elasticsearch refreshes the affected shards to make this operation visible to search, if `wait_for` then wait for a refresh to make this operation visible to search, if `false` do nothing with refreshes.
180+
* If `true`, Elasticsearch refreshes the affected shards to make this operation visible to search.
181+
* If `wait_for`, wait for a refresh to make this operation visible to search.
182+
* If `false`, do nothing with refreshes.
74183
* Valid values: `true`, `false`, `wait_for`.
75184
* @server_default false
76185
*/
77186
refresh?: Refresh
78187
/**
79-
* Custom value used to route operations to a specific shard.
188+
* A custom value that is used to route operations to a specific shard.
80189
*/
81190
routing?: Routing
82191
/**
83-
* `true` or `false` to return the `_source` field or not, or a list of fields to return.
192+
* Indicates whether to return the `_source` field (`true` or `false`) or contains a list of fields to return.
84193
*/
85194
_source?: SourceConfigParam
86195
/**
87196
* A comma-separated list of source fields to exclude from the response.
197+
* You can also use this parameter to exclude fields from the subset specified in `_source_includes` query parameter.
198+
* If the `_source` parameter is `false`, this parameter is ignored.
88199
*/
89200
_source_excludes?: Fields
90201
/**
91202
* A comma-separated list of source fields to include in the response.
203+
* If this parameter is specified, only these source fields are returned.
204+
* You can exclude fields from this subset using the `_source_excludes` query parameter.
205+
* If the `_source` parameter is `false`, this parameter is ignored.
92206
*/
93207
_source_includes?: Fields
94208
/**
95-
* Period each action waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards.
209+
* The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards.
210+
* The default is `1m` (one minute), which guarantees Elasticsearch waits for at least the timeout before failing.
211+
* The actual wait time could be longer, particularly when multiple waits occur.
96212
* @server_default 1m
97213
*/
98214
timeout?: Duration
99215
/**
100216
* The number of shard copies that must be active before proceeding with the operation.
101-
* Set to all or any positive integer up to the total number of shards in the index (`number_of_replicas+1`).
217+
* Set to `all` or any positive integer up to the total number of shards in the index (`number_of_replicas+1`).
218+
* The default is `1`, which waits for each primary shard to be active.
102219
* @server_default 1
103220
*/
104221
wait_for_active_shards?: WaitForActiveShards
105222
/**
106-
* If `true`, the requests actions must target an index alias.
223+
* If `true`, the request's actions must target an index alias.
107224
* @server_default false
108225
*/
109226
require_alias?: boolean
110227
/**
111-
* If `true`, the request's actions must target a data stream (existing or to-be-created).
228+
* If `true`, the request's actions must target a data stream (existing or to be created).
112229
* @server_default false
113230
*/
114231
require_data_stream?: boolean

specification/_global/bulk/BulkResponse.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,23 @@ import { long } from '@_types/Numeric'
2222
import { OperationType, ResponseItem } from './types'
2323

2424
export class Response {
25+
/**
26+
* The response contains the individual results of each operation in the request.
27+
* They are returned in the order submitted.
28+
* The success or failure of an individual operation does not affect other operations in the request.
29+
*/
2530
body: {
31+
/**
32+
* If `true`, one or more of the operations in the bulk request did not complete successfully.
33+
*/
2634
errors: boolean
35+
/**
36+
* The result of each operation in the bulk request, in the order they were submitted.
37+
*/
2738
items: SingleKeyDictionary<OperationType, ResponseItem>[]
39+
/**
40+
* The length of time, in milliseconds, it took to process the bulk request.
41+
*/
2842
took: long
2943
ingest_took?: long
3044
}
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
summary: Multiple operations
2+
# method_request: POST _bulk
3+
description: Run `POST _bulk` to perform multiple operations.
4+
# type: request
5+
value: '{ "index" : { "_index" : "test", "_id" : "1" } }
6+
7+
{ "field1" : "value1" }
8+
9+
{ "delete" : { "_index" : "test", "_id" : "2" } }
10+
11+
{ "create" : { "_index" : "test", "_id" : "3" } }
12+
13+
{ "field1" : "value3" }
14+
15+
{ "update" : {"_id" : "1", "_index" : "test"} }
16+
17+
{ "doc" : {"field2" : "value2"} }'
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
summary: Bulk updates
2+
# method_request: POST _bulk
3+
description: >
4+
When you run `POST _bulk` and use the `update` action, you can use `retry_on_conflict` as a field in the action itself (not in the extra payload line) to specify how many times an update should be retried in the case of a version conflict.
5+
# type: request
6+
value:
7+
'{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
8+
9+
{ "doc" : {"field" : "value"} }
10+
11+
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
12+
13+
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless",
14+
"params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
15+
16+
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
17+
18+
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
19+
20+
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
21+
22+
{ "doc" : {"field" : "value"} }
23+
24+
{ "update" : {"_id" : "4", "_index" : "index1"} }
25+
26+
{ "doc" : {"field" : "value"}, "_source": true}'
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
summary: Filter for failed operations
2+
# method_request: POST /_bulk
3+
description: >
4+
To return only information about failed operations, run `POST /_bulk?filter_path=items.*.error`.
5+
# type: request
6+
value: '{ "update": {"_id": "5", "_index": "index1"} }
7+
8+
{ "doc": {"my_field": "foo"} }
9+
10+
{ "update": {"_id": "6", "_index": "index1"} }
11+
12+
{ "doc": {"my_field": "foo"} }
13+
14+
{ "create": {"_id": "7", "_index": "index1"} }
15+
16+
{ "my_field": "foo" }'
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
summary: Dynamic templates
2+
method_request: POST /_bulk
3+
description: >
4+
Run `POST /_bulk` to perform a bulk request that consists of index and create actions with the `dynamic_templates` parameter.
5+
The bulk request creates two new fields `work_location` and `home_location` with type `geo_point` according to the `dynamic_templates` parameter.
6+
However, the `raw_location` field is created using default dynamic mapping rules, as a text field in that case since it is supplied as a string in the JSON document.
7+
# type: request
8+
value: "{ \"index\" : {\
9+
\ \"_index\" : \"my_index\", \"_id\" : \"1\", \"dynamic_templates\": {\"work_location\"\
10+
: \"geo_point\"}} }\n{ \"field\" : \"value1\", \"work_location\": \"41.12,-71.34\"\
11+
, \"raw_location\": \"41.12,-71.34\"}\n{ \"create\" : { \"_index\" : \"my_index\"\
12+
, \"_id\" : \"2\", \"dynamic_templates\": {\"home_location\": \"geo_point\"}} }\n\
13+
{ \"field\" : \"value2\", \"home_location\": \"41.12,-71.34\"}"
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
summary: Multiple successful operations
2+
# description: ''
3+
# type: response
4+
# response_code: ''
5+
value:
6+
"{\n \"took\": 30,\n \"errors\": false,\n \"items\": [\n {\n \
7+
\ \"index\": {\n \"_index\": \"test\",\n \"_id\": \"\
8+
1\",\n \"_version\": 1,\n \"result\": \"created\",\n \
9+
\ \"_shards\": {\n \"total\": 2,\n \"successful\"\
10+
: 1,\n \"failed\": 0\n },\n \"status\": 201,\n\
11+
\ \"_seq_no\" : 0,\n \"_primary_term\": 1\n }\n \
12+
\ },\n {\n \"delete\": {\n \"_index\": \"test\",\n \
13+
\ \"_id\": \"2\",\n \"_version\": 1,\n \"result\"\
14+
: \"not_found\",\n \"_shards\": {\n \"total\": 2,\n \
15+
\ \"successful\": 1,\n \"failed\": 0\n },\n\
16+
\ \"status\": 404,\n \"_seq_no\" : 1,\n \"_primary_term\"\
17+
\ : 2\n }\n },\n {\n \"create\": {\n \"_index\"\
18+
: \"test\",\n \"_id\": \"3\",\n \"_version\": 1,\n \
19+
\ \"result\": \"created\",\n \"_shards\": {\n \"total\"\
20+
: 2,\n \"successful\": 1,\n \"failed\": 0\n \
21+
\ },\n \"status\": 201,\n \"_seq_no\" : 2,\n \
22+
\ \"_primary_term\" : 3\n }\n },\n {\n \"update\": {\n\
23+
\ \"_index\": \"test\",\n \"_id\": \"1\",\n \"\
24+
_version\": 2,\n \"result\": \"updated\",\n \"_shards\": {\n\
25+
\ \"total\": 2,\n \"successful\": 1,\n \
26+
\ \"failed\": 0\n },\n \"status\": 200,\n \"\
27+
_seq_no\" : 3,\n \"_primary_term\" : 4\n }\n }\n ]\n}"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
summary: Failed actions
2+
description: >
3+
If you run `POST /_bulk` with operations that update non-existent documents, the operations cannot complete successfully.
4+
The API returns a response with an `errors` property value `true`.
5+
The response also includes an error object for any failed operations.
6+
The error object contains additional information about the failure, such as the error type and reason.
7+
# type: response
8+
# response_code: ''
9+
value:
10+
"{\n \"took\": 486,\n \"errors\": true,\n \"items\": [\n {\n \"\
11+
update\": {\n \"_index\": \"index1\",\n \"_id\": \"5\",\n \"\
12+
status\": 404,\n \"error\": {\n \"type\": \"document_missing_exception\"\
13+
,\n \"reason\": \"[5]: document missing\",\n \"index_uuid\": \"\
14+
aAsFqTI0Tc2W0LCWgPNrOA\",\n \"shard\": \"0\",\n \"index\": \"\
15+
index1\"\n }\n }\n },\n {\n \"update\": {\n \"_index\"\
16+
: \"index1\",\n \"_id\": \"6\",\n \"status\": 404,\n \"error\"\
17+
: {\n \"type\": \"document_missing_exception\",\n \"reason\":\
18+
\ \"[6]: document missing\",\n \"index_uuid\": \"aAsFqTI0Tc2W0LCWgPNrOA\"\
19+
,\n \"shard\": \"0\",\n \"index\": \"index1\"\n }\n \
20+
\ }\n },\n {\n \"create\": {\n \"_index\": \"index1\",\n \
21+
\ \"_id\": \"7\",\n \"_version\": 1,\n \"result\": \"created\"\
22+
,\n \"_shards\": {\n \"total\": 2,\n \"successful\": 1,\n\
23+
\ \"failed\": 0\n },\n \"_seq_no\": 0,\n \"_primary_term\"\
24+
: 1,\n \"status\": 201\n }\n }\n ]\n}"
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
summary: Filter for failed operations
2+
description: >
3+
An example response from `POST /_bulk?filter_path=items.*.error`, which returns only information about failed operations.
4+
# type: response
5+
# response_code: ''
6+
value:
7+
"{\n \"items\": [\n {\n \"update\": {\n \"error\": {\n \
8+
\ \"type\": \"document_missing_exception\",\n \"reason\": \"[5]: document\
9+
\ missing\",\n \"index_uuid\": \"aAsFqTI0Tc2W0LCWgPNrOA\",\n \"\
10+
shard\": \"0\",\n \"index\": \"index1\"\n }\n }\n },\n \
11+
\ {\n \"update\": {\n \"error\": {\n \"type\": \"document_missing_exception\"\
12+
,\n \"reason\": \"[6]: document missing\",\n \"index_uuid\": \"\
13+
aAsFqTI0Tc2W0LCWgPNrOA\",\n \"shard\": \"0\",\n \"index\": \"\
14+
index1\"\n }\n }\n }\n ]\n}"

0 commit comments

Comments
 (0)