Skip to content

Reducing the memory footprint of compiling lib:Cabal #8074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Kleidukos opened this issue Mar 31, 2022 · 25 comments
Open

Reducing the memory footprint of compiling lib:Cabal #8074

Kleidukos opened this issue Mar 31, 2022 · 25 comments

Comments

@Kleidukos
Copy link
Member

Kleidukos commented Mar 31, 2022

I have been doing some light investigation of what are some of the most consuming modules in lib:Cabal, and here are the results based on time-ghc-modules:

8ahmtcdl

I don't really know how to interpret this beyond "Distribution.Simple.Setup should be split in a way that enables multi-core compilation of the package". From a quick glance at this module, there aren't any type-level techniques that would be obviously expensive to compile.

Of course there is a certain number of modules in the package, but I'm positive that expensive compilation is not a fatality.

What do you think?

@Mikolaj
Copy link
Member

Mikolaj commented Mar 31, 2022

Quite amazing. BTW, do we even enable multi-core compilation of the package in CI? I can't find this GHC option in our scripts.

@Kleidukos
Copy link
Member Author

We might benefit from that in a light capacity, the GH Actions runners have 2 cores and 7GB of RAM.

@Mikolaj
Copy link
Member

Mikolaj commented Mar 31, 2022

Right, if we only have 2 cores, it's probably better to devote them to compiling many packages in parallel, not many modules in parallel.

Anyway, there is chance the split file would compile faster regardless of multicore, because some passes are not linear.

@michaelpj
Copy link
Collaborator

Distribution.SPDX.LicenseId has been investigated significantly already by the GHC devs.

Distribution.Simple.Setup also has lots of large-ish records with Generic instances, which seem to be common culprits for slow compilation time.

@tchoutri
Copy link
Collaborator

tchoutri commented Apr 1, 2022

It could be interesting to switch to TH-based deriving for the classes that allow it

@Kleidukos
Copy link
Member Author

https://hackage.haskell.org/package/large-records-0.2.1.0 was released

@gbaz
Copy link
Collaborator

gbaz commented Apr 6, 2022

Asking cabal to depend on that looks to me to be a nonstarter.

That said, I think we should revisit #5893 and axe the Generic instance, and arguably axe any big generic instances in Distribution.Simple.Setup as well, if we can do so safely.

@Kleidukos
Copy link
Member Author

Got it!

@andreabedini
Copy link
Collaborator

(I'm commenting here rather than re-opening the old discussion).

Reading #5893 leads to this

I'm strongly against this, as this would mean that we'd end up with orphan instances somewhere, cause Generic instances are useful especially for non-trivial types such as LicenseId. Compile times are no reason to cripple an API, and 30s more are not much in the overall build time of Cabal tbh. If anything, we should try to optimise GHC than to cause regressions in APIs.

To be fair, I think this can be indeed a problem. Removing the Generic instance from Distribution.Simple.Setup could indeed be a significant breaking change. Perhaps we can asses the impact on Hackage? And/or have a deprecation plan.

/jk Personally I am happy to move fast and break things but then people get upset :D

@Kleidukos that's an awesome plot btw

@Kleidukos
Copy link
Member Author

@andreabedini Regarding usage on Hackage, here is a cursory search, feel free to improve the query string: https://hackage-search.serokell.io/?q=%5CWLicenseId

Regarding the consumption of resources, we are going to need some time spent to optimise Generics, but that can't be done by one single person with 2 hours on their hands in an evening. That must be a clear decision taken to improve the Haskell ecosystem with the idea of avoiding breakage, because "moving to another solution" simply isn't acceptable.

@andreabedini
Copy link
Collaborator

@Kleidukos good, those search results look encouraging. Perhaps it's a breaking change we can afford.

Regarding the consumption of resources, we are going to need some time spent to optimise Generics, but that can't be done by one single person with 2 hours on their hands in an evening.

Happy to support that with everything I have.

@Kleidukos
Copy link
Member Author

Kleidukos commented Apr 7, 2022

@andreabedini After discussing with @bgamari, this ticket https://gitlab.haskell.org/ghc/ghc/-/issues/5642 is the most relevant in our case. Jay and Peyton-Jones have tackled this problem in the paper Scrap your type applications but the work was never merged. Unfortunately the solution in the paper is quite complicated.

@Kleidukos
Copy link
Member Author

On the GHC tracker: https://gitlab.haskell.org/ghc/ghc/-/issues/16577

@gbaz
Copy link
Collaborator

gbaz commented Apr 21, 2022

ghchq has expressed a preference for not deriving generic, because the build times have a cost on CI infrastructure.

@TeofilC
Copy link
Collaborator

TeofilC commented Apr 24, 2022

I did a little audit to see if anything on Hackage depends on Distributation.SPDX.LicenseId.LicenseId's Generic instance.
LicenseId is exported by Distributation.SPDX and Distribution.SPDX.LicenseId. I used the following search to find packages that import one of those modules: https://hackage-search.serokell.io/?q=import.*Distribution%5C.SPDX

It's mentioned by 13 packages. Two of those are Cabal and cabal-install so we can discount those. The rest are:

  • arch-hs: doesn't seem to use the Generic instance.
  • cabal2nix: doesn't seem to use the Generic instance.
  • dhall-to-cabal: doesn't seem to use the Generic instance.
  • hackport: doesn't seem to use the Generic instance.
  • hadolint: doesn't seem to use the Generic instance.
  • hinit: doesn't seem to use the Generic instance.
  • hpack: doesn't seem to use the Generic instance.
  • purescript: doesn't seem to use the Generic instance.
  • spdx: re-exports LicenseId so some risk here, but doesn't seem to use the Generic instance otherwise. hadolint is the only reverse dependency.
  • spdx-license: doesn't seem to use the Generic instance.
  • stylish-cabal: does use a generics-sop Generic instance but derives it using TH for tests

So in general as far as I can tell nothing on Hackage uses the Generic instance.

In general I think Generic instances aren't helpful for large enums. You always basically either want to derive instances based on the Enum instance (eg, for Hashable) or a textual representation of the constructors (eg, for something like Read/Show).

@TeofilC
Copy link
Collaborator

TeofilC commented Mar 3, 2023

I reran dump-timings against the current HEAD using GHC-9.4.4.

It looks like Distribution.PackageDescription.Check and Distribution.PackageDescription.FieldGrammar are now the slowest modules to build. But all under 10s!

image

@ulysses4ever
Copy link
Collaborator

ulysses4ever commented Mar 3, 2023

@TeofilC good to see you're back on it! Could you try #8211? It's imminent and it should make Check much more reasonable in terms of size (apart everything else).

@ffaf1 may be interested in this work: paging in…

@TeofilC
Copy link
Collaborator

TeofilC commented Mar 3, 2023

Good shout @ulysses4ever ! With that MR we get this:

Distribution.PackageDescription.Check has moved down quite a few.

image

@ulysses4ever
Copy link
Collaborator

ulysses4ever commented Mar 4, 2023

What is a bit daunting in all that is that it looks like it correlates with the number of lines a lot: I haven't checked precisely but I recently ran cloc, and some modules you show were at the top there too. Curious if there are modules that are at the top in your measurement but somewhere lower by the LOC metric (so, small(er) modules that are slow to compile). I guess, those would be ones where deriving is plentiful?..

I wonder if simple splitting wins much. But anyway I think there shouldn't be multi-KLOC files on principle, under any circumstances, so I'd approve splitting.

Curious that with the newer GHC there's a difference with what Kleidukos showed in the top post (a year ago): there's no modules where there is anything substantial except for Simplifier and CodeGen.

Still, when I simply do cabal build Cabal, where is the most time spent? Linking, perhaps?

@ulysses4ever
Copy link
Collaborator

Also, the topic of this issue mentions "memory footprint", not compile times. So I wonder if we're even measuring the right thing...

@TeofilC
Copy link
Collaborator

TeofilC commented Mar 4, 2023

What is a bit daunting in all that is that it looks like it correlates with the number of lines a lot: I haven't checked precisely but I recently ran cloc, and some modules you show were at the top there too. Curious if there are modules that are at the top in your measurement but somewhere lower by the LOC metric (so, small(er) modules that are slow to compile). I guess, those would be ones where deriving is plentiful?..

The slowest module, Distribution.PackageDescription.FieldGrammar is not huge at 800 LOC. My impression from a glance is that some sort of excessive inlining is going on.

In my experience that's the other most common issue along with deriving Generic. Yesod's form applicative is a culprit for this. I wouldn't be surprised if the Applicative used in that module is causing something similar. I'll take a look at the Core at some point to confirm.

Language.Haskell.Extension on the other hand, at first glance, looks like slowness due to a big enum's instances. It would be good to investigate if deriving via the Enum instance and removing Generic speeds things up.

I wonder if simple splitting wins much. But anyway I think there shouldn't be multi-KLOC files on principle, under any circumstances, so I'd approve splitting.

In my experience splitting does help a bit, but it depends on what's causing the slowness. If there's exponential inlining, then it won't help too much, but if there's just a lot of stuff in a file it can help a good bit. For Distribution.Simple.Setup splitting reduced total run time from 15s to 13s. The gains are even more if you are building with parallel make.

Curious that with the newer GHC there's a difference with what Kleidukos showed in the top post (a year ago): there's no modules where there is anything substantial except for Simplifier and CodeGen.

Yeah right. I was thinking this as well. I had similar results with 9.2 and 9.4. The GHC devs have definitely made some big improvements. Another thing is that I'm running it with +RTS -A256M -RTS to try to avoid GC skewing things, but that might be speeding things up as well. And of course I'm running it on a different computer but I would doubt if that's a big part of the cause.

Still, when I simply do cabal build Cabal, where is the most time spent? Linking, perhaps?

Indeed it's a bit hard to tell. I think linking the package isn't included in those graphs as it's not a module level thing? I'm not really sure.

Also, the topic of this issue mentions "memory footprint", not compile times. So I wonder if we're even measuring the right thing...

Great point. Maybe we should have a separate ticket for compile times. Though I think these things all tend to be highly correlated.


Another thing to investigate is the shape of the module graph. There's this upcoming GHC feature that provides a bunch of metrics: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9435. I've also implemented a tool to do something similar by parsing GHC's -v2 output (my tool is undocumented and not really ready for public consumption yet) https://codeberg.org/teo/shortstraw

@TeofilC
Copy link
Collaborator

TeofilC commented Mar 4, 2023

Here's the critical path:

Critical path
(0.7162200000000001,"Distribution.Simple")
(1.7876500000000002,"Distribution.Simple.Haddock")
(0.7455299999999998,"Distribution.Simple.Build")
(2.74947,"Distribution.Simple.Configure")
(2.7968099999999994,"Distribution.PackageDescription.Check")
(1.82106,"Distribution.PackageDescription.Check.Target")
(0.16654,"Distribution.Simple.BuildPaths")
(0.27612999999999993,"Distribution.Simple.LocalBuildInfo")
(1.60209,"Distribution.Types.LocalBuildInfo")
(4.805610000000001,"Distribution.Simple.Setup.Config")
(0.19838999999999998,"Distribution.Simple.Setup.Common")
(2.7899999999999998e-2,"Distribution.Simple.Program")
(0.35796,"Distribution.Simple.Program.Db")
(0.15616999999999998,"Distribution.Simple.Program.Builtin")
(2.03161,"Distribution.Simple.Program.GHC")
(5.781e-2,"Distribution.Simple.GHC.ImplInfo")
(1.2294100000000001,"Distribution.Simple.Compiler")
(1.1517899999999999,"Distribution.Simple.Utils")
(0.7479300000000001,"Distribution.Verbosity")
(0.45861,"Distribution.Verbosity.Internal")
SUM: 23.884690000000003

And a simulation of how the modules can compile in parallel that you can open with https://ui.perfetto.dev/#!/viewer

https://gist.github.com/TeofilC/dae4c01f968e192b5287d870b3272b18#file-lib-cabal-chrome-trace-json

@Kleidukos
Copy link
Member Author

@TeofilC Thanks a bunch for the visualisation. @doyougnu you might be interested by this ☝️

@doyougnu
Copy link

doyougnu commented Mar 7, 2023

@Kleidukos very interested! In fact its on my hit list already and I've wanted to run a cachegrind on Distribution.Simple.Setup to get instruction counts by line in GHC and in Cabal. Unfortunately I don't have room on my plate right now to take a good look, but I'll stay tuned. Making Cabal compilation times faster is IMHO a good way to speed up GHC build times (while the two are still tightly coupled).

@TeofilC
Copy link
Collaborator

TeofilC commented Mar 7, 2023

I've recently merged an MR that follows Hecaté's suggestion to break up Distribution.Simple.Setup. That's helped quite a bit. I have some more stats in the MR. You might be interested by this comment in particular @doyougnu #8130 (comment) . It shows (as one might expect) that derived instances for the type contribute a lot to compile times.

I think basically the reason Distribution.Simple.Setup was slow is that it had a bunch of these big types that all had a bunch of derived instances.

I think a good next step would be to split the ConfigFlag type into some smaller types, hopefully breaking it up into some nice semantic chunks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants