-
Notifications
You must be signed in to change notification settings - Fork 181
IntSet: reverse bitmap for faster comparison? #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I remember those options being discussed in the original paper; you may want to take a look and see to what extent the times have changed. Another thought, for |
"options being discussed in the original paper" - you mean Okasaki and Gill 1998 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.5452 ? This is for Maps, not Sets, so they don't have bitmaps in the leaves. They discuss little endian vs. big endian patricia trees - and make this remark regarding asymmetric operations: "there does not appear to be a clever bit-twiddling solution to calculate the highest one bit in a number, as there was for the lowest one bit". This point is moot now since containers uses clz/ctz primops? I am not seeing any reference to bitwise tricks in Morrison 1968 http://www.mathcs.emory.edu/~cheung/papers/XML/PatriciaTrie-JACM1968.pdf (if you meant that with "original paper") Before we go any further with this idea (of reversing bitmaps for faster |
this gives results like
I conclude that a more efficient implementation of I am not really certain about these data: I sprinkled the code with Inline and Specialize pragmas. Some of them do change the runtimes. I have no precise idea why. As I currently do not have an actual use case (I deal with automata sometimes, but I don't need NFA->DFA right now), I will not push this any further. |
Do you have any thoughts on my idea of compressing the prefix and mask into
one word for IntSet.
…On Fri, Aug 2, 2019, 2:42 PM jwaldmann ***@***.***> wrote:
type IntSet = Word
see
https://github.com/jwaldmann/containers/blob/intset%3Dword/containers-tests/benchmarks/IntSet.hs#L53
this gives results like
benchmarked ***@***.***
time 336.4 ms (333.0 ms .. 341.3 ms)
benchmarked ***@***.***
time 274.1 ms (272.8 ms .. 274.9 ms)
I conclude that a more efficient implementation of instance Ord IntSet
can at best save another 20 percent runtime w.r.t. the current proposal
(for the case that the sets are really Tips).
I am not really certain about these data: I sprinkled the code with Inline
and Specialize pragmas. Some of them do change the runtimes. I have no
precise idea why.
As I currently do not have an actual use case (I deal with automata
sometimes, but I don't need NFA->DFA right now), I will not push this any
further.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#674?email_source=notifications&email_token=AAOOF7IDNRCAG66QHH2MSBLQCR5XXA5CNFSM4IHMN5ZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ORJSI#issuecomment-517805257>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOOF7LGFBPPWGMZ23SI3A3QCR5XXANCNFSM4IHMN5ZA>
.
|
The other asymmetric operation I'm aware of is iteration: it is faster to iterate low to high bit than vice versa. This translates to Since people tend to prefer
Thanks for proposing this anyway, it is an interesting idea to consider. Please reopen if you have more to discuss on this. |
This is just an idea to improve
instance Ord IntSet
(related to #470 ). It's quite a pervasive change, and it'd help only in a special case - that may occur frequently, though.When all elements of the IntSet are small, the tree is in fact
Tip prefix bitmap
. For just that special case,instance Eq IntSet
is just two comparisons of machine words,but
instance Ord IntSet
(in suggested #670) needs more ops (more than 10, seerelateTipTip
,relateBM
).It would be much easier if
compare (Tip p bm1) (Tip p bm2) = compare bm1 bm2
but since the comparison must have
fromAscList
semantics, we need= compare (reverse bm1) (reverse bm2)
(the implementation does not actually userevNat
)Instead of doing the reversal here, we could define
bitmapOf x
not as2^x
but2^(wordSize - 1 - x)
In the general case (comparing
Tip
s that sit belowBin
) we need theRelation
result (that encodes 5 possible results) so there's no hope of doing this in one op.I think that the underlying reason for all this is that some ops on machine words are uniform (direction does not matter, as in
.&.
), some are symmetric (two directions, but identical cost, e.g., shift-left, shift-right), but some are asymmetric (one direction, the other one is missing: lexicographic comparison, carry propagation in arithmetical operations).Now everything regarding bitmaps (not prefixes!) in IntSet is uniform or symmetric - except for this comparison?
The text was updated successfully, but these errors were encountered: