-
Notifications
You must be signed in to change notification settings - Fork 711
Reducing the memory footprint of compiling lib:Cabal #8074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Quite amazing. BTW, do we even enable multi-core compilation of the package in CI? I can't find this GHC option in our scripts. |
We might benefit from that in a light capacity, the GH Actions runners have 2 cores and 7GB of RAM. |
Right, if we only have 2 cores, it's probably better to devote them to compiling many packages in parallel, not many modules in parallel. Anyway, there is chance the split file would compile faster regardless of multicore, because some passes are not linear. |
|
It could be interesting to switch to TH-based deriving for the classes that allow it |
Asking cabal to depend on that looks to me to be a nonstarter. That said, I think we should revisit #5893 and axe the Generic instance, and arguably axe any big generic instances in Distribution.Simple.Setup as well, if we can do so safely. |
Got it! |
(I'm commenting here rather than re-opening the old discussion).
To be fair, I think this can be indeed a problem. Removing the Generic instance from /jk Personally I am happy to move fast and break things but then people get upset :D @Kleidukos that's an awesome plot btw |
@andreabedini Regarding usage on Hackage, here is a cursory search, feel free to improve the query string: https://hackage-search.serokell.io/?q=%5CWLicenseId Regarding the consumption of resources, we are going to need some time spent to optimise Generics, but that can't be done by one single person with 2 hours on their hands in an evening. That must be a clear decision taken to improve the Haskell ecosystem with the idea of avoiding breakage, because "moving to another solution" simply isn't acceptable. |
@Kleidukos good, those search results look encouraging. Perhaps it's a breaking change we can afford.
Happy to support that with everything I have. |
@andreabedini After discussing with @bgamari, this ticket https://gitlab.haskell.org/ghc/ghc/-/issues/5642 is the most relevant in our case. Jay and Peyton-Jones have tackled this problem in the paper Scrap your type applications but the work was never merged. Unfortunately the solution in the paper is quite complicated. |
On the GHC tracker: https://gitlab.haskell.org/ghc/ghc/-/issues/16577 |
ghchq has expressed a preference for not deriving generic, because the build times have a cost on CI infrastructure. |
I did a little audit to see if anything on Hackage depends on It's mentioned by 13 packages. Two of those are Cabal and cabal-install so we can discount those. The rest are:
So in general as far as I can tell nothing on Hackage uses the Generic instance. In general I think Generic instances aren't helpful for large enums. You always basically either want to derive instances based on the Enum instance (eg, for Hashable) or a textual representation of the constructors (eg, for something like Read/Show). |
Good shout @ulysses4ever ! With that MR we get this:
|
What is a bit daunting in all that is that it looks like it correlates with the number of lines a lot: I haven't checked precisely but I recently ran cloc, and some modules you show were at the top there too. Curious if there are modules that are at the top in your measurement but somewhere lower by the LOC metric (so, small(er) modules that are slow to compile). I guess, those would be ones where I wonder if simple splitting wins much. But anyway I think there shouldn't be multi-KLOC files on principle, under any circumstances, so I'd approve splitting. Curious that with the newer GHC there's a difference with what Kleidukos showed in the top post (a year ago): there's no modules where there is anything substantial except for Simplifier and CodeGen. Still, when I simply do |
Also, the topic of this issue mentions "memory footprint", not compile times. So I wonder if we're even measuring the right thing... |
The slowest module, In my experience that's the other most common issue along with
In my experience splitting does help a bit, but it depends on what's causing the slowness. If there's exponential inlining, then it won't help too much, but if there's just a lot of stuff in a file it can help a good bit. For
Yeah right. I was thinking this as well. I had similar results with 9.2 and 9.4. The GHC devs have definitely made some big improvements. Another thing is that I'm running it with
Indeed it's a bit hard to tell. I think linking the package isn't included in those graphs as it's not a module level thing? I'm not really sure.
Great point. Maybe we should have a separate ticket for compile times. Though I think these things all tend to be highly correlated. Another thing to investigate is the shape of the module graph. There's this upcoming GHC feature that provides a bunch of metrics: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/9435. I've also implemented a tool to do something similar by parsing GHC's |
Here's the critical path: Critical path
And a simulation of how the modules can compile in parallel that you can open with https://ui.perfetto.dev/#!/viewer https://gist.github.com/TeofilC/dae4c01f968e192b5287d870b3272b18#file-lib-cabal-chrome-trace-json |
@Kleidukos very interested! In fact its on my hit list already and I've wanted to run a cachegrind on |
I've recently merged an MR that follows Hecaté's suggestion to break up I think basically the reason I think a good next step would be to split the |
Uh oh!
There was an error while loading. Please reload this page.
I have been doing some light investigation of what are some of the most consuming modules in
lib:Cabal
, and here are the results based on time-ghc-modules:I don't really know how to interpret this beyond "Distribution.Simple.Setup should be split in a way that enables multi-core compilation of the package". From a quick glance at this module, there aren't any type-level techniques that would be obviously expensive to compile.
Of course there is a certain number of modules in the package, but I'm positive that expensive compilation is not a fatality.
What do you think?
The text was updated successfully, but these errors were encountered: