Skip to content

Map Builders should preserve order #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hendrikmuhs opened this issue Jan 10, 2022 · 9 comments
Closed

Map Builders should preserve order #99

hendrikmuhs opened this issue Jan 10, 2022 · 9 comments
Labels
Category: Enhancement New feature or request v7.16.2

Comments

@hendrikmuhs
Copy link

Object Builders for maps use an InternalMap which is a sub-class of a hash map:

private static final class InternalMap<K, V> extends HashMap<K, V> {

Hash maps are unordered, the order is defined by the internal hashing function and can be different for different runtime environments. Although the JSON standard does not define dictionaries to be ordered, it would be good to switch to a linked hash map with insertion order:

Because this does not seems performance-critical, I suggest to switch to a linked hash map with insertion order

@swallez
Copy link
Member

swallez commented Jan 10, 2022

Java indeed has LinkedHashMap and its overhead is pretty minimal. However, having insertion order preservation as an implicit contract is IMHO a problem:

  • the JSON spec states that "An object is an unordered collection of zero or more name/value pairs". So this is in fact a bug in that API definition: it should have used an array if order matters (as is done for sort criteria).
  • we have a number of client libraries where the language does not have an insertion order preserving map in its standard library: this is the case of Go, C# and Rust.

We can introduce a workaround in the Java client for this specific API (by adding an annotation on the API spec that will trigger the workaround in the code generator), but we should not assume that all client libraries will be able to implement it.

@swallez swallez added v7.16.2 Category: Enhancement New feature or request labels Jan 10, 2022
@swallez
Copy link
Member

swallez commented Jan 13, 2022

@hendrikmuhs after discussing this with the larger Clients Team and careful evaluation, we've decided to not implement the change you suggest.

The problem, as I mentioned previously, is that some languages have no builtin support for order-preserving maps, while others have (even as a default, like Python or Ruby). Yet all languages are compliant with the JSON spec, which is the semantic model on which the Elasticsearch API is built.

So implementing this in the Java client, even if simple and innocuous from a technical perspective, will not solve the issue for the entire range or client libraries, maintained either by Elastic or by 3rd parties.

If this lack of ordering is a blocking issue, then this API should be updated to a format that can capture order. The example of sort criteria comes to mind, which are defined as an array of objects containing a single key.

Feel free to reach out to the Clients Team if you want to discuss this further.

@swallez swallez closed this as completed Jan 13, 2022
@hendrikmuhs
Copy link
Author

  • the JSON spec states that "An object is an unordered collection of zero or more name/value pairs". So this is in fact a bug in that API definition: it should have used an array if order matters (as is done for sort criteria).

That's correct w.r.t. the JSON spec, I already mentioned that in the beginning. This is fine for clients that take JSON or a dictionary, but not for the java client:

A java client user uses a builder, he does not know and does not care whether JSON is used behind the scenes. This is an implementation detail.

I agree that other client languages require their own assessment, I can not speak for all, but e.g. the python client has a totally different interface. There I could use an ordered dict if I want to preserve order.

So implementing this in the Java client, even if simple and innocuous from a technical perspective, will not solve the issue for the entire range or client libraries, maintained either by Elastic or by 3rd parties.

Yes, I agree. I opened this issue for the java client, but not for other languages.

If this lack of ordering is a blocking issue, then this API should be updated to a format that can capture order.

I think that's the long term way to solve this, the transform API should be changed to optionally take an array of objects:

[{"key1": {...} }, {"key2": {...} }, {"key3": {...} }]

instead of:

"key1": {...}, "key2": {...}, "key3": {...}

I will discuss this with the backend team.

Note that this does not change the way a user interacts with the java client. He would still call the builder, the only difference is the way the client calls the backend afterwards.

Feel free to reach out to the Clients Team if you want to discuss this further.

I will, after some consultation with the backend team.

@technige
Copy link
Contributor

Hi @hendrikmuhs

Respectfully, I beg to differ on JSON being an implementation detail. From the sole point of view of the Java client user, this may be true, but the full public product surface offered by Elastic includes not only the API offered by the clients, but also the HTTP/JSON interface itself, which can be used directly.

It is everyone's general interest to ensure that the user experience is consistent across all parts of the product surface and divergent semantics (X can be considered ordered if used via method 1, but not via method 2) increases both maintenance costs and usage complexity.

So ultimately, we cannot look at at the Java client in isolation; it has to be designed and built in line with all the other components within our product suite.

In terms of your specific case, my recommendation would be to discuss how to persist the ordering information across all layers of the system for the API you have in mind. That way, we may be able to enhance the semantics there in a way that works for all parts of the product surface equally.

@hendrikmuhs
Copy link
Author

hendrikmuhs commented Jan 13, 2022

@technige

The problem appears only if the java client is used (maybe for other clients, too, that has to be investigated), the API works fine using the HTTP API. It's the client that breaks the contract as it shuffles what the user puts in.

@technige
Copy link
Contributor

What contract is broken? As already mentioned above, JSON objects are defined as unordered, so any JSON parser is free to yield the values it reads in whatever order is convenient for the implementation. The only way to avoid that and retain the order in which the server happened to write them would be to write your own custom JSON parser which guarantees to retain that information.

And similarly, while it may normally do so in practice, the server itself does not guarantee that it will output the information in the payload in the same order as the user provided it. Again, this is a result of JSON having been chosen as the output format, rather than a vendor-specific variant of JSON (e.g. application/vnd.elastic.json-with-ordered-maps) .

But no contracts that exist today have been broken. It's simply that those contracts don't go far enough to cover your requirement. And so I return to my previous point: that we should instead discuss how to expose the additional semantics for the specific API.

@swallez
Copy link
Member

swallez commented Jan 13, 2022

@hendrikmuhs the exact same issue will happen with our Go, C# and Rust clients, and any custom code people will write in these languages.

The Python dictionary, Ruby hash and PHP associative array happen to keep insertion order, but this is just a happy coincidence.

@hendrikmuhs
Copy link
Author

Let's take this offline, it looks like gh is not the right place to discuss this, we seem to discuss different issues. I am very aware that JSON dictionaries do not define an order, I've never claimed so.

We will discuss in backend and get in touch afterwards.

@hendrikmuhs
Copy link
Author

We have discussed the underlying issue in the backend team:

The transform API parses/stores/serializes the configuration in question in a way that preserves order, this works if the REST API is used directly. This is out of the specification and inconsistent to other parts of the stack. The reason for this feature is performance. Starting with 7.15 transform has some builtin optimizations for improving performance by optimizing the group by query. That's one of the reasons why we've decided to keep the API the way it is and not change/enhance the syntax (make it order-aware or add sort criteria). We will rather spend more time on improving the optimizations. As a result we will adjust the documentation accordingly. If you are using REST it will still preserve order, but that's considered an implementation detail.

That's for the technical aspect of it. From a usability perspective I still think that consistent order helps UX. But that's not my decision.

Thanks for your time, I will close the corresponding HLRC issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Enhancement New feature or request v7.16.2
Projects
None yet
Development

No branches or pull requests

3 participants