Skip to content

Extend polymorphic compare to support objects. #2657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

cristianoc
Copy link
Collaborator

Extend polymorphic compare to support objects created as e.g. [%bs.obj {x=1; y=2}].

The comparison of {x:1, y:2} and {x:3, y:4} is defined logically as if the objects were lists of pairs: the first keys are compared, then the first values, then the second keys, then the second values, etc.
In particular, the order of keys matters, and {x:1, y:2} is different from {y:2, x:1}.

TODO: equality and hash.

Objects, when viewed as blocks, have size 0 and no tag.
Here an object o is recognized by checking that o.constructor === Object.
Then, the keys and values can be extracted.

The compare function keeps the same structure, with one extra while loop, and does not change unless objects are encountered. Presumably Object.keys(o) performs some allocation.

Extend polymorphic compare to support objects created as e.g. `[%bs.obj {x=1; y=2}]`.

TODO: equality and hash.

Objects, when viewed as blocks, have size 0 and no tag.
Here an object `o` is recognized by checking that `o.constructor === Objects`.
Then, the keys and values can be extracted.

The compare function keeps the same structure, with one extra while loop, and does not change unless objects are encountered. Presumably `Object.keys(o)` performs some allocation.
Copy link
Member

@chenglou chenglou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=D

@@ -198,6 +208,17 @@ let rec caml_compare (a : Obj.t) (b : Obj.t) : int =
else
let len_a = Bs_obj.length a in
let len_b = Bs_obj.length b in
if len_a = 0 && len_b = 0 && O.is_object a && O.is_object b then
begin
let keys_a = O.keys a in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate that this allocates though. Should we loop instead?

Copy link
Member

@bobzhang bobzhang Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we did not support for .. in currently, maybe it's fine to write such function in raw js?

Here is something I imagined:

let unsafe_js_compare : Obj.t -> Obj.t -> int = [%raw{| function (first,second){
}
|}]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing about looping is that we have to traverse both objects, one item at a time, as that's the order in which comparison proceeds. I could not find a way to do that.

One could choose a different order for the comparison, e.g. as in sets sorted by keys, but again I would not know how to do that without allocating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is object keys order important? This passes the type checker:
https://reasonml.github.io/en/try.html?reason=C4TwDgpgBAZg9nKBeKBvAdFARPOWBcUAlgHbAA02ARgIYBOBUAzsHaQOYC+A3AFC8AbCMCgBbEADEEhXMjRZaDQlgAWEEFko4EjAIw9+QkTRJxgaulLgyEc1NryFdWxY1XqsPIA
So you'd accidentally consider these two objs as non-equal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to ignore the order of keys, by considering them as a sorted set. Trying that next.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. So if the order of keys doesn’t matter, you can do two for-loops; check object1 against object2, then object2 against object1. This avoids the allocations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed more offline. Seems like there's no way to avoid allocation: some ordering on keys needs to be defined, and elements compared in that order.
Two possible orders are: 1 the order in which keys appear in the objects, 2 alphabetical order.

If going for 1, one needs to compare the first key in a with the first key in b, and so on. This can't be done with a for..in.
(With a for..in one can compare the first key in a with the corresponding key in b, but this is not a well-defined order, as e.g. it's not symmetric.)

If going for 2, one needs to compare the keys in alphabetical order, and this can't be done with a for..in.

Currently choosing to go for 2, which has the property that {x:1, y:2} and {y:2, x:1} are considered equal, so insertion order does not get in the way.

Copy link
Member

@bobzhang bobzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes perfect sense to support objects, some extensive tests needed though.

Equality should be synced up, hash is not that relevant in my opinion

To make the comparison order-independent, sort the keys before comparing them.
cristianoc pushed a commit to cristianoc/bucklescript that referenced this pull request Mar 23, 2018
@cristianoc
Copy link
Collaborator Author

@bobzhang there are several tests that exercise a bunch of corner cases.

@@ -223,6 +248,16 @@ and aux_length_b_short (a : Obj.t) (b : Obj.t) i short_length =
let res = caml_compare (Obj.field a i) (Obj.field b i) in
if res <> 0 then res
else aux_length_b_short a b (i+1) short_length
and aux_obj_compare (a: Obj.t) keys_a (b: Obj.t) keys_b i min_len default_res =
if i = min_len then default_res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the length is not the same, should we exit early here?
Note it does not to have to follow the semantics of caml_compare for block which is constrained by existing ocaml specification.

Oh, one thing which reminds of me that caml_compare semantics will be changed if we change record from array to dict is that the iteration order needs to be the same as declaration order

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a way to avoid Object.keys entirely, and only rely on for..in, the code structure will change.

let default_res = len_a - len_b in
aux_obj_compare a keys_a b keys_b 0 min_len default_res
end
else
if len_a = len_b then
aux_same_length a b 0 len_a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we put this branch earlier, this branch seems to be hit more often than the previous one? Is it okay to assume Array.isArray API exists? (cc @chenglou )

if len_a = len_b && is_array a 

In that case, we don't need o.constructor == Object

let () = O.sort(keys_b) in
aux_obj_equal a keys_a b keys_b 0 len_a
end
else
if len_a = len_b then
aux_equal_length a b 0 len_a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@cristianoc
Copy link
Collaborator Author

Found a way to avoid calling Object.keys entirely and in principle avoid allocation.
A key is a witness that a is smaller than b if it's only in b or it's in both and the corresponding value in a is smaller.
It's possible to traverse a and b separately to find the smallest (in alphabetical order) key witnessing that a is smaller, and that b is smaller, then the smaller of the two keys determines the result.

Found a way to avoid calling `Object.keys entirely` and in principle avoid allocation.
A key is a witness that `a` is smaller than `b` if it's only in `b` or it's in both and the corresponding value in `a` is smaller.
It's possible to traverse `a` and `b` separately to find the smallest (in alphabetical order) key witnessing that `a` is smaller, and that `b` is smaller, then the smaller of the two keys determines the result.
var a$1 = a;
var b$1 = b;
var result = [/* true */1];
var do_key_a = (function(b$1,result){
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether it's possible to avoid creating a function here -- perhaps by writing the entire aux_obj_compare in js directly.

b$1,
min_key_rhs
];
var do_key_a = (function(partial_arg){
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether it's possible to avoid creating a function here -- perhaps by writing the entire aux_obj_compare in js directly.

@@ -326,4 +424,4 @@ exports.caml_lessthan = caml_lessthan;
exports.caml_lessequal = caml_lessequal;
exports.caml_min = caml_min;
exports.caml_max = caml_max;
/* No side effect */
/* for_in Not a pure module */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have some cascading effect

@bobzhang
Copy link
Member

I would like to have this functionality in first and iterate later.

  1. Is the perf of polymorphic comparison a big concern? maybe not, for people who want high perf, they should probably write a type based comparison (with our explicit warning, it is quite easy to avoid polymorphic comparison)

  2. Is possible to use Array.isArray instead of using O.constructor == Object, the former is like a more standard way

  3. Rewrite this function in JS? I tend to agree that the whole caml_compare function should be written in JS, but maybe in next revision?

@cristianoc
Copy link
Collaborator Author

@bobzhang sure, using Array.isArray now. If there are cases that are neither arrays nor object, it will behave the same as now (compare returns 0).

@bobzhang
Copy link
Member

The algorithm looks correct to me.
But I have a feeling it is not necessarily faster than the original one, (using Obj.keys) also performance is not a big deal in such cases.

The api Array.isArray should be good to use ? cc @rickyvetter @chenglou

Feel free to merge it when you think it is ready

@cristianoc cristianoc merged commit f0ceddc into rescript-lang:master Mar 28, 2018
@@ -223,6 +237,27 @@ and aux_length_b_short (a : Obj.t) (b : Obj.t) i short_length =
let res = caml_compare (Obj.field a i) (Obj.field b i) in
if res <> 0 then res
else aux_length_b_short a b (i+1) short_length
and aux_obj_compare (a: Obj.t) (b: Obj.t) =
let min_key_lhs = ref None in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably avoid the allocation here if we used Js.Nullable.t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would work, though this runtime library does not have access to Js.Nullable.
I have a straight js version in da928db which does that and simplifies a bunch of other things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobzhang what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants