Offer Int32Map and Int64Map #355

treeowl · 2016-10-24T17:55:56Z

Someone suggested this weekend that we should really offer both 32-bit and 64-bit maps regardless of architecture, and I thoroughly agree. The same applies to IntSet, of course.

AndreasPK · 2022-09-28T01:20:34Z

I don't think there is much benefit in 32 bit maps.

On 32bit they behave exactly the same as IntMap.
On 64bit they might have a tiny perf benefit through smaller instruction encodings. But that is unlikely to be worth the cost of having one more map type unless one uses backpack or similar to implement them. But even then it's possible the added code size makes up for any benefit in this area.

For Int64 it looks different though, since we can't use a IntMap on 32bit platforms.

If that approach is deemed acceptable I might add a Int64Map (by taking the IntMap implementation and changing the types). It looks like we will need one for GHC in the near future and it would be good to be able to use containers for this.

PS: If there is a strong demand for a Int32Map and it's shown to be worth the cost it might still be good. I just personally don't think it will have enough of a benefit to look into that myself.

noughtmare · 2023-06-28T16:12:43Z

I've converted IntSet and IntMap to Word64Set and Word64Map as part of this GHC MR:

https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10568

It would require some polishing before upstreaming it. I might have time for it soon, but I'd gladly hand it over if there are other volunteers.

treeowl · 2023-06-28T16:29:28Z

I would definitely want these, but how can we do it without increasing the maintenance burden too much?

noughtmare · 2023-06-28T18:24:55Z

That depends on what you consider to be too much. The simplest way to upstream these is to just copy the files, but I get the impression you consider adding ~7000 lines of code as adding too much burden, which is a fair point.

But I don't see any easy solutions. The code is very similar in many places, but the types are pretty much all different.

I could imagine making a poor man's backpack by using something like #define SET IntSet or #define SET Word64Set to generate two modules from the same file. But I don't know if I would consider that much easier to maintain.

treeowl · 2023-06-28T19:11:08Z

Just throwing an idea around: once our GHC lower bound rises high enough, we'll be able to have multiple "libraries" in the package. Will this give us enough power to have the necessary type definitions in a module that has different contents for the different libraries?

treeowl · 2023-06-28T19:13:12Z

To be clear, my objection isn't the amount of code, per se, but the duplication. Having to make every change in triplicate is pretty darn annoying. The set/map split is bad enough.

noughtmare · 2023-06-28T22:10:26Z

@Bodigrim suggested we use CPP following other boot libraries like filepath. That seems like a promising approach. Concretely filepath defines System.OsPath like this:

{-# LANGUAGE CPP #-}

#define FILEPATH_NAME OsPath
#define OSSTRING_NAME OsString
#define WORD_NAME OsChar

...

#include "OsPath/Common.hs"

And then in OsPath/Common.hs there is code like this:

...

splitSearchPath :: OSSTRING_NAME -> [FILEPATH_NAME]
splitSearchPath (OSSTRING_NAME x) = fmap OSSTRING_NAME . C.splitSearchPath $ x

...

treeowl · 2023-06-28T22:11:59Z

How does that affect source links in the Haddocks? I would expect it to destroy them, which is ... not great.

noughtmare · 2023-06-28T22:13:56Z

Yes, that seems like a disadvantage, e.g. this link doesn't point to anything useful: https://hackage.haskell.org/package/filepath-1.4.100.3/docs/src/System.OsPath.Posix.html#extSeparator

treeowl · 2023-06-28T22:56:09Z

Ugh. Using the (long-standing) private library support is enough for what I was talking about (no need for the new multiple public libraries), but there's a big problem: the names of the modules, and more importantly the names of the types won't work out right without some CPP. I don't know just how bad that is in practice; probably better than what filepath does.

meooow25 · 2024-08-17T03:22:33Z

I've converted IntSet and IntMap to Word64Set and Word64Map as part of this GHC MR:

https://gitlab.haskell.org/ghc/ghc/-/merge_requests/10568

Since it is unclear if this will make it into containers, are there plans to keep GHC's copy in sync with containers? Otherwise GHC is cut off from optimizations and bug fixes made here.

IMO the easiest way to get Word64Map on 32-bit is to nest IntMaps.

newtype Word64Map a = Word64Map (IntMap (IntMap a))

The top map is indexed by the high 32 bits of the Word64 and any inner map is indexed by the low 32 bits. Map operations should not be too hard to write.

high, low :: Word64 -> Int

lookup :: Word64 -> Word64Map a -> Maybe a
lookup k (Word64Map m) = IM.lookup (high k) m >>= IM.lookup (low k)

union :: Word64Map a -> Word64Map a -> Word64Map a
union (Word64Map m1) (Word64Map m2) = Word64Map (IM.unionWith IM.union m1 m2)

...

Admittedly this would be slightly slower compared to a directly implemented version, but I don't expect it to be too bad.

Not sure if this was considered for GHC, but it is simple and allows benefiting from development in containers.

(cc @noughtmare)

noughtmare · 2024-08-17T08:54:20Z

Whatever we decide to do, we should do it in containers so that everyone can benefit. I think we should only modify GHC's Word64Map if we discover critical issues. I had not considered the nested IntMap solution.

So, I think there are four options:

Do nothing
Copy Word64Map from GHC to containers (duplicates code, higher maintenance burden)
Use internal libraries and/or CPP (this requires newer cabal version and/or obfuscates source links in Haddock)
Use nested IntMaps (unknown performance hit)

I don't have time to champion any of these solutions in the near future, but I'm willing to help anyone who does.

treeowl · 2024-08-17T17:06:28Z

Nested IntMaps are twice as deep. Not a bad hack by any means, but probably substantially slower.

meooow25 · 2024-08-18T01:32:29Z

I think we should only modify GHC's Word64Map if we discover critical issues.

I see, that's unfortunate.

Whatever we decide to do, we should do it in containers so that everyone can benefit.

Are there parties other than GHC who would like to see this?

Nested IntMaps are twice as deep. Not a bad hack by any means, but probably substantially slower.

It's not twice as deep, it's deeper by exactly one. In other words, a path from root to leaf passes through only one extra step, the Tip of the root map which links to an inner map. That's why I expect it to be only "slightly slower".

treeowl · 2024-08-18T03:13:07Z

It's not twice as deep, it's deeper by exactly one. In other words, a path from root to leaf passes through only one extra step, the Tip of the root map which links to an inner map. That's why I expect it to be only "slightly slower".

I think you're right; I didn't think that through enough. But can it be two deeper sometimes, depending on where the branches are?

meooow25 · 2024-08-18T08:39:22Z

But can it be two deeper sometimes, depending on where the branches are?

I'm not seeing it, do you have a concrete example perhaps?

That's why I expect it to be only "slightly slower".

I'm beginning to realize that it may even be faster, since we are on a 32-bit system.

All comparisons and bitwise ops will be 32-bit, not 64-bit, except for the part where we split with 64-bit word.
The data stored at the Bins and Tips will be 32-bit, not 64-bit. Taking into account Optimize IntMap.Bin #995, GHC's Word64Map is currently storing 4 words of data (prefix, mask) in Bins and 2 words in Tips instead of just 1 in both, which is 4 excess words per element. Now there is a cost of the root map Tips. If the data is dense, which is likely the case for GHC since it was using an IntMap until recently, this is only 3 words per 2^32 elements.

It looks like a promising idea to test out on GHC with a 32-bit system, should someone volunteer.

noughtmare · 2024-08-18T08:52:15Z

I see, that's unfortunate.

Why? Ah, I see you've recently made some contributions that could potentially also benefit GHC. Maybe it would be worthwhile to port over some of them. Perhaps it would be useful to collect a number of those changes and port them over at the same time.

I should mention that this is my personal opinion and does not reflect the opinion of the GHC team. And I'm no longer a very active contributor to (that part of) GHC. Perhaps @mpickering could chime in.

Are there parties other than GHC who would like to see this?

This issue wasn't opened by GHC, but perhaps GHC is the first that actually ran into issues with IntMap on 32-bit platforms.

It looks like a promising idea to test out on GHC with a 32-bit system, should someone volunteer.

For people who do want to try this, you don't need a 32-bit system. You can instead use ghc-docker-jobs.sh with the i386-linux-deb12-validate job.

treeowl assigned wrengr May 25, 2017

m-renaud added the feature-request label Dec 26, 2017

sjakobi added IntMap IntSet labels Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer Int32Map and Int64Map #355

Offer Int32Map and Int64Map #355

treeowl commented Oct 24, 2016 •

edited

Loading

AndreasPK commented Sep 28, 2022 •

edited

Loading

noughtmare commented Jun 28, 2023 •

edited

Loading

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 •

edited

Loading

treeowl commented Jun 28, 2023

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 •

edited

Loading

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 •

edited

Loading

treeowl commented Jun 28, 2023

meooow25 commented Aug 17, 2024

noughtmare commented Aug 17, 2024

treeowl commented Aug 17, 2024

meooow25 commented Aug 18, 2024 •

edited

Loading

treeowl commented Aug 18, 2024

meooow25 commented Aug 18, 2024 •

edited

Loading

noughtmare commented Aug 18, 2024

Offer Int32Map and Int64Map #355

Offer Int32Map and Int64Map #355

Comments

treeowl commented Oct 24, 2016 • edited Loading

AndreasPK commented Sep 28, 2022 • edited Loading

noughtmare commented Jun 28, 2023 • edited Loading

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 • edited Loading

treeowl commented Jun 28, 2023

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 • edited Loading

treeowl commented Jun 28, 2023

noughtmare commented Jun 28, 2023 • edited Loading

treeowl commented Jun 28, 2023

meooow25 commented Aug 17, 2024

noughtmare commented Aug 17, 2024

treeowl commented Aug 17, 2024

meooow25 commented Aug 18, 2024 • edited Loading

treeowl commented Aug 18, 2024

meooow25 commented Aug 18, 2024 • edited Loading

noughtmare commented Aug 18, 2024

treeowl commented Oct 24, 2016 •

edited

Loading

AndreasPK commented Sep 28, 2022 •

edited

Loading

noughtmare commented Jun 28, 2023 •

edited

Loading

noughtmare commented Jun 28, 2023 •

edited

Loading

noughtmare commented Jun 28, 2023 •

edited

Loading

noughtmare commented Jun 28, 2023 •

edited

Loading

meooow25 commented Aug 18, 2024 •

edited

Loading

meooow25 commented Aug 18, 2024 •

edited

Loading