Skip to content

Add msearch api to high level client #27274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 5, 2017

Conversation

martijnvg
Copy link
Member

No description provided.

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @martijnvg for working on this. I left some comments, mainly around testing and minor stuff.

@@ -381,6 +383,17 @@ static Request clearScroll(ClearScrollRequest clearScrollRequest) throws IOExcep
return new Request("DELETE", "/_search/scroll", Collections.emptyMap(), entity);
}

static Request multiSearch(MultiSearchRequest multiSearchRequest) throws IOException {
Params params = Params.builder();
if (multiSearchRequest.maxConcurrentSearchRequests() != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would rather not duplicate default values in the client. What's the downside of always passing the parameter no matter what its value is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I did this... I'll always pass down this parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember now why I needed to add this. 0 is used as not specified and the maxConcurrentSearchRequests(...) setter doesn't accepts values lower than 1. So checking whether any other value than 0 is specified is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yea we can't avoid this then. Maybe share the default value through a constant so at least we don't duplicate it. Odd! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is odd especially because it seems that once you set a value for this field, you can never reset it to its original default value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I think this should be fixed and the setter should allow 0 as valid value. So that the msearch api falls back to default based on number of cores.

client().performRequest("PUT", "/index3/doc/5", Collections.emptyMap(), doc5);
StringEntity doc6 = new StringEntity("{\"field\":\"value2\"}", ContentType.APPLICATION_JSON);
client().performRequest("PUT", "/index3/doc/6", Collections.emptyMap(), doc6);
client().performRequest("POST", "/index1,index2,index3/_refresh");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to merge this class with the existing SearchIT?

SearchIT.assertSearchHeader(multiSearchResponse.getResponses()[2].getResponse());
assertThat(multiSearchResponse.getResponses()[2].getResponse().getHits().getTotalHits(), Matchers.equalTo(1L));
assertThat(multiSearchResponse.getResponses()[2].getResponse().getHits().getAt(0).getId(), Matchers.equalTo("6"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also have a test for some other common feature like aggregations / highlighting etc. performed via _msearch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe have a test around failures, which can be tricky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test all possible search features here? I think this covered by the SearchIT?

I'll try to add tests for failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all features, just a few of the most common ones I 'd say.

return new MultiSearchResponse(items, randomNonNegativeLong());
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some test to RequestTests which tests the request conversion from MultiSearchRequest to Request (enpoint, method, params etc.)?

}

public static BytesRef writeMultiLineFormat(MultiSearchRequest multiSearchRequest, XContent xContent) throws IOException {
BytesRefBuilder builder = new BytesRefBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the dependency on lucene necessary here? Could we find another way to do the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at another api where BytesRef was used, I'll try to covert to BytesReference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I see that also bulk depends on BytesRef which is not great. If it's too much work we can do it as a follow-up.

try (XContentBuilder xContentBuilder = XContentBuilder.builder(xContent)) {
xContentBuilder.startObject();
if (request.indices() != null) {
xContentBuilder.field("indices", request.indices());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we use index rather than indices? That is what we have in our docs, indices is just a synonym but I don't think that other clients use it either.

}

static MultiSearchResponse.Item itemFromXContent(XContentParser parser) throws IOException {
// TODO: Is there a better way? The msearch item format is very tricky...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on what is tricky? Is that because items can either be errors or proper instance of Item ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only key is error then we need we need to parse the xcontent is an error an otherwise parse as a search response. In order to read the the first field we need to parse it up until the field FIELD token is found to decide what to do, but then we are already to far to use SearchResponse#fromXContent(...), it requires that the first token is a start object token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. shall we have an innerFromXContent that does not require start_object then, and accepts that we are already at the field_name token? That could be called from msearch? Exceptions should be parseable already with this technique.


@Override
public int hashCode() {
return Objects.hash(maxConcurrentSearchRequests, requests, indicesOptions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add specific unit tests for equals / hashcode ?

}
}

private MultiSearchResponse createTestInstance() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be static?

MultiSearchResponse expected = createTestInstance();
XContentType xContentType = randomFrom(XContentType.values());
BytesReference shuffled = toShuffledXContent(expected, xContentType, ToXContent.EMPTY_PARAMS, false);
XContentParser parser = createParser(XContentFactory.xContent(xContentType), shuffled);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a test that inserts random fields here so that we test forward compatibility? Have a look at SearchResponseTests#testFromXContentWithRandomFields for an example of that.

@martijnvg
Copy link
Member Author

@javanna Thanks for reviewing. I've updated the PR.

@martijnvg martijnvg force-pushed the hl_client_mseach_api branch from 7fb8c2e to 3672923 Compare November 8, 2017 13:14
@javanna javanna self-assigned this Nov 13, 2017
@martijnvg martijnvg added v6.2.0 and removed v6.1.0 labels Nov 22, 2017
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments and questions but LGTM

static Request multiSearch(MultiSearchRequest multiSearchRequest) throws IOException {
Params params = Params.builder();
params.putParam(RestSearchAction.TYPED_KEYS_PARAM, "true");
if (multiSearchRequest.maxConcurrentSearchRequests() != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a constant in MultiSearchRequest for the 0 default value?

assert token == Token.FIELD_NAME;
boolean statusBeenParsed = false;
String fieldName = parser.currentName();
if ("status".equals(fieldName)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this parser relies on order of keys? can we change it so that it doesn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a test util method that we use to shuffle fields in the response so we make sure that our parsers don't rely on specific keys ordering.

public class MultiSearchResponseTests extends ESTestCase {

public void testFromXContent() throws IOException {
for (int runs = 0; runs < 20; runs++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that we can drop this loop here and in other tests, and instead just rely on multiple runs of the same tests in our CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstract parsing tests and query builder tests do iterate for several times and that is why I added this loop here, since we extend directly from ESTestCase. I prefer to keep it, if you are ok with it.

XContentType xContentType = randomFrom(XContentType.values());
BytesReference shuffled = toShuffledXContent(expected, xContentType, ToXContent.EMPTY_PARAMS, false);
if (randomBoolean()) {
shuffled = insertRandomFields(xContentType, shuffled, s -> s.contains("responses"), random());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it makes little sense to call insertRandomFields if random fields can only be inserted at the root level (the exclude filter identifies the only root element expected). But I am guessing you are doing this because exceptions cause issues if we insert random elements in their json, and being responses an array, it is not possible to identify exceptions only in an exclude filter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it doesn't make a lot of sense. Identifying whether something is a search response or a exception becomes not possible with the current parsing logic.

I'll remove the insertion of random fields.

item = new Item(SearchResponse.innerFromXContent(parser), null);
assert parser.currentToken() == Token.END_OBJECT; // SearchResponse.innerFromXContent(...) consumes the entire json object
}
return item;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, it seems like this parsing method relies on keys ordering, but when I look at its test it shuffles the keys which means I am not reading it right. I guess the point is that status is ignored, and it will either ignored when it appears before any other element, or while parsing exception or search hits? Maybe that's what your comment above explained but I didn't get it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really relies on key ordering but instead it checks the field one after the other, ignoring status if found as the first field or ignoring it again right after error, or let it be parsed by the SearchResponse.innerFromXContent().

I agree that this parsing logic is not easy to do but I'm wondering if it could be more readable with the "usual" while loop that skip children on status field, and delegates to the appropriate parsing methods for error and search response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @tlrx is saying is correct, it the parsing does not rely on ordering.

I think I can rewrite this code into a loop. However becauseElasticsearchException.failureFromXContent(...) does not parse until end object and SearchResponse#innerFromXContent(...) does, the loop will have to immediately break in order to not read other search responses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me Martijn thanks.

@tlrx tlrx self-requested a review December 4, 2017 08:33
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall! I left some minor comments and I'd like to know if the parsing logic in MultiSearchResponse.itemFromXContent() could be improved a bit. If that's not feasible or has too much impact on other parsing method then it can go in like this.

searchRequest.scroll((Scroll) null);
// only expand_wildcards, ignore_unavailable and allow_no_indices can be specified from msearch api, so unset other options:
IndicesOptions randomlyGenerated = searchRequest.indicesOptions();
IndicesOptions msearchDefault = IndicesOptions.strictExpandOpenAndForbidClosed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could be new MultiSearchRequest().indicesOptions() instead

public String toString() {
return "MultiSearchRequest{" +
"maxConcurrentSearchRequests=" + maxConcurrentSearchRequests +
", requests=" + requests +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit reluctant to have all the search requests rendered here. The SearchRequest.toString() method is verbose and spits out the whole content of the request, so the result here could be very large and I'm worried that it fills some logs or slows down the IDE when debugging. I also don't see much benefit to output everything, but maybe I'm wrong. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll remove the toString() method.

}

public static byte[] writeMultiLineFormat(MultiSearchRequest multiSearchRequest, XContent xContent) throws IOException {
ByteArrayOutputStream output = new ByteArrayOutputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use BytesStreamOutput

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an added benefit for using BytesStreamOutput? Also getting the raw byte[] which is what gets used, requires extra steps (BytesReference -> BytesRef -> copy BytesRef's byte buffer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also getting the raw byte[] which is what gets used, requires extra steps (BytesReference -> BytesRef -> copy BytesRef's byte buffer)

Good point

private static final ConstructingObjectParser<MultiSearchResponse, Void> PARSER = new ConstructingObjectParser<>("multi_search",
true, a -> new MultiSearchResponse(((List<Item>)a[0]).toArray(new Item[0]), (long) a[1]));
static {
PARSER.declareObjectArray(constructorArg(), (p, c) -> itemFromXContent(p),RESPONSES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before RESPONSES

item = new Item(SearchResponse.innerFromXContent(parser), null);
assert parser.currentToken() == Token.END_OBJECT; // SearchResponse.innerFromXContent(...) consumes the entire json object
}
return item;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not really relies on key ordering but instead it checks the field one after the other, ignoring status if found as the first field or ignoring it again right after error, or let it be parsed by the SearchResponse.innerFromXContent().

I agree that this parsing logic is not easy to do but I'm wondering if it could be more readable with the "usual" while loop that skip children on status field, and delegates to the appropriate parsing methods for error and search response?

@martijnvg martijnvg force-pushed the hl_client_mseach_api branch from 3672923 to 96e8e8a Compare December 4, 2017 15:54
@martijnvg
Copy link
Member Author

Thanks for looking @javanna @tlrx. I've update the PR and changed the multi search response item parsing to be a loop.

if ("status".equals(fieldName)) {
// Ignore the status value
} else {
throw new IllegalArgumentException("unexpected field name [" + fieldName + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be more permissive and just ignore the field? We used to be permissive when parsing responses in order to have a better forward compatibility...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I've changed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be permissive....but we are not testing forward comp properly because of the structure of these objects. This is quite a bummer, we should probably look into testing this, otherwise permissive or not doesn't make a difference as we don't test it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna Don't we usually inject random fields to test this?

Copy link
Member

@javanna javanna Dec 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but we discussed as part of this review issues around testing this, because the response contains an array of objects which can be either exceptions or an ordinary search response. Exceptions still can hold metadata fields, hence injecting random fields in there would cause issues and needs to be skipped. It is complicated to distinguish between exceptions and search responses in the array, hence we don't currently insert random fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks @javanna, I've forgot this discussion. As long as it is permissive, even not tested, it's OK to be merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll merge once the pr build is green.

@martijnvg martijnvg force-pushed the hl_client_mseach_api branch 2 times, most recently from a4d429f to b826c45 Compare December 5, 2017 07:05
@martijnvg martijnvg force-pushed the hl_client_mseach_api branch from b826c45 to 4d78e1a Compare December 5, 2017 09:18
@martijnvg martijnvg merged commit 4d78e1a into elastic:master Dec 5, 2017
@GoldenHans
Copy link

hello! Could u tell me when this method is released? i really need it! Thank you for your help!

@javanna
Copy link
Member

javanna commented Jan 29, 2018

@GoldenHans it will be released with 6.2. It should come out soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants