-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Add msearch api to high level client #27274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @martijnvg for working on this. I left some comments, mainly around testing and minor stuff.
@@ -381,6 +383,17 @@ static Request clearScroll(ClearScrollRequest clearScrollRequest) throws IOExcep | |||
return new Request("DELETE", "/_search/scroll", Collections.emptyMap(), entity); | |||
} | |||
|
|||
static Request multiSearch(MultiSearchRequest multiSearchRequest) throws IOException { | |||
Params params = Params.builder(); | |||
if (multiSearchRequest.maxConcurrentSearchRequests() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would rather not duplicate default values in the client. What's the downside of always passing the parameter no matter what its value is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I did this... I'll always pass down this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now why I needed to add this. 0
is used as not specified and the maxConcurrentSearchRequests(...)
setter doesn't accepts values lower than 1. So checking whether any other value than 0 is specified is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yea we can't avoid this then. Maybe share the default value through a constant so at least we don't duplicate it. Odd! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is odd especially because it seems that once you set a value for this field, you can never reset it to its original default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I think this should be fixed and the setter should allow 0
as valid value. So that the msearch api falls back to default based on number of cores.
client().performRequest("PUT", "/index3/doc/5", Collections.emptyMap(), doc5); | ||
StringEntity doc6 = new StringEntity("{\"field\":\"value2\"}", ContentType.APPLICATION_JSON); | ||
client().performRequest("PUT", "/index3/doc/6", Collections.emptyMap(), doc6); | ||
client().performRequest("POST", "/index1,index2,index3/_refresh"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to merge this class with the existing SearchIT
?
SearchIT.assertSearchHeader(multiSearchResponse.getResponses()[2].getResponse()); | ||
assertThat(multiSearchResponse.getResponses()[2].getResponse().getHits().getTotalHits(), Matchers.equalTo(1L)); | ||
assertThat(multiSearchResponse.getResponses()[2].getResponse().getHits().getAt(0).getId(), Matchers.equalTo("6")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also have a test for some other common feature like aggregations / highlighting etc. performed via _msearch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe have a test around failures, which can be tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to test all possible search features here? I think this covered by the SearchIT
?
I'll try to add tests for failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all features, just a few of the most common ones I 'd say.
return new MultiSearchResponse(items, randomNonNegativeLong()); | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some test to RequestTests which tests the request conversion from MultiSearchRequest to Request (enpoint, method, params etc.)?
} | ||
|
||
public static BytesRef writeMultiLineFormat(MultiSearchRequest multiSearchRequest, XContent xContent) throws IOException { | ||
BytesRefBuilder builder = new BytesRefBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the dependency on lucene necessary here? Could we find another way to do the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at another api where BytesRef
was used, I'll try to covert to BytesReference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea I see that also bulk depends on BytesRef which is not great. If it's too much work we can do it as a follow-up.
try (XContentBuilder xContentBuilder = XContentBuilder.builder(xContent)) { | ||
xContentBuilder.startObject(); | ||
if (request.indices() != null) { | ||
xContentBuilder.field("indices", request.indices()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we use index
rather than indices
? That is what we have in our docs, indices is just a synonym but I don't think that other clients use it either.
} | ||
|
||
static MultiSearchResponse.Item itemFromXContent(XContentParser parser) throws IOException { | ||
// TODO: Is there a better way? The msearch item format is very tricky... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate on what is tricky? Is that because items can either be errors or proper instance of Item ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the only key is error then we need we need to parse the xcontent is an error an otherwise parse as a search response. In order to read the the first field we need to parse it up until the field FIELD token is found to decide what to do, but then we are already to far to use SearchResponse#fromXContent(...)
, it requires that the first token is a start object token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha. shall we have an innerFromXContent that does not require start_object then, and accepts that we are already at the field_name token? That could be called from msearch? Exceptions should be parseable already with this technique.
|
||
@Override | ||
public int hashCode() { | ||
return Objects.hash(maxConcurrentSearchRequests, requests, indicesOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add specific unit tests for equals / hashcode ?
} | ||
} | ||
|
||
private MultiSearchResponse createTestInstance() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be static?
MultiSearchResponse expected = createTestInstance(); | ||
XContentType xContentType = randomFrom(XContentType.values()); | ||
BytesReference shuffled = toShuffledXContent(expected, xContentType, ToXContent.EMPTY_PARAMS, false); | ||
XContentParser parser = createParser(XContentFactory.xContent(xContentType), shuffled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test that inserts random fields here so that we test forward compatibility? Have a look at SearchResponseTests#testFromXContentWithRandomFields
for an example of that.
@javanna Thanks for reviewing. I've updated the PR. |
7fb8c2e
to
3672923
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments and questions but LGTM
static Request multiSearch(MultiSearchRequest multiSearchRequest) throws IOException { | ||
Params params = Params.builder(); | ||
params.putParam(RestSearchAction.TYPED_KEYS_PARAM, "true"); | ||
if (multiSearchRequest.maxConcurrentSearchRequests() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you mind adding a constant in MultiSearchRequest for the 0
default value?
assert token == Token.FIELD_NAME; | ||
boolean statusBeenParsed = false; | ||
String fieldName = parser.currentName(); | ||
if ("status".equals(fieldName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this parser relies on order of keys? can we change it so that it doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a test util method that we use to shuffle fields in the response so we make sure that our parsers don't rely on specific keys ordering.
public class MultiSearchResponseTests extends ESTestCase { | ||
|
||
public void testFromXContent() throws IOException { | ||
for (int runs = 0; runs < 20; runs++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think that we can drop this loop here and in other tests, and instead just rely on multiple runs of the same tests in our CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The abstract parsing tests and query builder tests do iterate for several times and that is why I added this loop here, since we extend directly from ESTestCase
. I prefer to keep it, if you are ok with it.
XContentType xContentType = randomFrom(XContentType.values()); | ||
BytesReference shuffled = toShuffledXContent(expected, xContentType, ToXContent.EMPTY_PARAMS, false); | ||
if (randomBoolean()) { | ||
shuffled = insertRandomFields(xContentType, shuffled, s -> s.contains("responses"), random()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it makes little sense to call insertRandomFields if random fields can only be inserted at the root level (the exclude filter identifies the only root element expected). But I am guessing you are doing this because exceptions cause issues if we insert random elements in their json, and being responses
an array, it is not possible to identify exceptions only in an exclude filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it doesn't make a lot of sense. Identifying whether something is a search response or a exception becomes not possible with the current parsing logic.
I'll remove the insertion of random fields.
item = new Item(SearchResponse.innerFromXContent(parser), null); | ||
assert parser.currentToken() == Token.END_OBJECT; // SearchResponse.innerFromXContent(...) consumes the entire json object | ||
} | ||
return item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at the code, it seems like this parsing method relies on keys ordering, but when I look at its test it shuffles the keys which means I am not reading it right. I guess the point is that status
is ignored, and it will either ignored when it appears before any other element, or while parsing exception or search hits? Maybe that's what your comment above explained but I didn't get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not really relies on key ordering but instead it checks the field one after the other, ignoring status
if found as the first field or ignoring it again right after error
, or let it be parsed by the SearchResponse.innerFromXContent()
.
I agree that this parsing logic is not easy to do but I'm wondering if it could be more readable with the "usual" while loop that skip children on status
field, and delegates to the appropriate parsing methods for error
and search response?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @tlrx is saying is correct, it the parsing does not rely on ordering.
I think I can rewrite this code into a loop. However becauseElasticsearchException.failureFromXContent(...)
does not parse until end object and SearchResponse#innerFromXContent(...)
does, the loop will have to immediately break in order to not read other search responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me Martijn thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall! I left some minor comments and I'd like to know if the parsing logic in MultiSearchResponse.itemFromXContent() could be improved a bit. If that's not feasible or has too much impact on other parsing method then it can go in like this.
searchRequest.scroll((Scroll) null); | ||
// only expand_wildcards, ignore_unavailable and allow_no_indices can be specified from msearch api, so unset other options: | ||
IndicesOptions randomlyGenerated = searchRequest.indicesOptions(); | ||
IndicesOptions msearchDefault = IndicesOptions.strictExpandOpenAndForbidClosed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could be new MultiSearchRequest().indicesOptions()
instead
public String toString() { | ||
return "MultiSearchRequest{" + | ||
"maxConcurrentSearchRequests=" + maxConcurrentSearchRequests + | ||
", requests=" + requests + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit reluctant to have all the search requests rendered here. The SearchRequest.toString() method is verbose and spits out the whole content of the request, so the result here could be very large and I'm worried that it fills some logs or slows down the IDE when debugging. I also don't see much benefit to output everything, but maybe I'm wrong. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'll remove the toString() method.
} | ||
|
||
public static byte[] writeMultiLineFormat(MultiSearchRequest multiSearchRequest, XContent xContent) throws IOException { | ||
ByteArrayOutputStream output = new ByteArrayOutputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use BytesStreamOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is an added benefit for using BytesStreamOutput
? Also getting the raw byte[] which is what gets used, requires extra steps (BytesReference -> BytesRef -> copy BytesRef's byte buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also getting the raw byte[] which is what gets used, requires extra steps (BytesReference -> BytesRef -> copy BytesRef's byte buffer)
Good point
private static final ConstructingObjectParser<MultiSearchResponse, Void> PARSER = new ConstructingObjectParser<>("multi_search", | ||
true, a -> new MultiSearchResponse(((List<Item>)a[0]).toArray(new Item[0]), (long) a[1])); | ||
static { | ||
PARSER.declareObjectArray(constructorArg(), (p, c) -> itemFromXContent(p),RESPONSES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space before RESPONSES
item = new Item(SearchResponse.innerFromXContent(parser), null); | ||
assert parser.currentToken() == Token.END_OBJECT; // SearchResponse.innerFromXContent(...) consumes the entire json object | ||
} | ||
return item; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not really relies on key ordering but instead it checks the field one after the other, ignoring status
if found as the first field or ignoring it again right after error
, or let it be parsed by the SearchResponse.innerFromXContent()
.
I agree that this parsing logic is not easy to do but I'm wondering if it could be more readable with the "usual" while loop that skip children on status
field, and delegates to the appropriate parsing methods for error
and search response?
3672923
to
96e8e8a
Compare
if ("status".equals(fieldName)) { | ||
// Ignore the status value | ||
} else { | ||
throw new IllegalArgumentException("unexpected field name [" + fieldName + "]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be more permissive and just ignore the field? We used to be permissive when parsing responses in order to have a better forward compatibility...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I've changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be permissive....but we are not testing forward comp properly because of the structure of these objects. This is quite a bummer, we should probably look into testing this, otherwise permissive or not doesn't make a difference as we don't test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@javanna Don't we usually inject random fields to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but we discussed as part of this review issues around testing this, because the response contains an array of objects which can be either exceptions or an ordinary search response. Exceptions still can hold metadata fields, hence injecting random fields in there would cause issues and needs to be skipped. It is complicated to distinguish between exceptions and search responses in the array, hence we don't currently insert random fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks @javanna, I've forgot this discussion. As long as it is permissive, even not tested, it's OK to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll merge once the pr build is green.
a4d429f
to
b826c45
Compare
b826c45
to
4d78e1a
Compare
hello! Could u tell me when this method is released? i really need it! Thank you for your help! |
@GoldenHans it will be released with 6.2. It should come out soon. |
No description provided.