Skip to content

hls-graph speculatively builds all prior dependencies for a key even if they might have changed #3423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wz1000 opened this issue Dec 23, 2022 · 11 comments
Labels
performance Issues about memory consumption, responsiveness, etc. status: in discussion Not actionable, because discussion is still ongoing or there's no decision yet type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..

Comments

@wz1000
Copy link
Collaborator

wz1000 commented Dec 23, 2022

I ran into this bug while debugging a flaky test case -

A.hs is initially not an FOI, so GetModIface A.hs has two dependencies IsFileOfInterest A.hs and GetModIfaceFromDiskAndIndex A.hs. Then A.hs is opened, turning it into an FOI

getModIfaceRule recorder = defineEarlyCutoff (cmapWithPrio LogShake recorder) $ Rule $ \GetModIface f -> do
fileOfInterest <- use_ IsFileOfInterest f
res@(_,(_,mhmi)) <- case fileOfInterest of
IsFOI status -> do
-- Never load from disk for files of interest
tmr <- use_ TypeCheck f
linkableType <- getLinkableType f
hsc <- hscEnv <$> use_ GhcSessionDeps f
let compile = fmap ([],) $ use GenerateCore f
se <- getShakeExtras
(diags, !hiFile) <- writeCoreFileIfNeeded se hsc linkableType compile tmr
let fp = hiFileFingerPrint <$> hiFile
hiDiags <- case hiFile of
Just hiFile
| OnDisk <- status
, not (tmrDeferredError tmr) -> liftIO $ writeHiFile se hsc hiFile
_ -> pure []
return (fp, (diags++hiDiags, hiFile))
NotFOI -> do
hiFile <- use GetModIfaceFromDiskAndIndex f
let fp = hiFileFingerPrint <$> hiFile
return (fp, ([], hiFile))
pure res

At this point A.hs should have dependencies IsFileOfInterest A.hs, Typecheck A.hs ... (the IsFOI branch). However, I was seeing GetModIfaceFromDiskAndIndex still getting run for A.hs.

Investigating further, it turns out to be the implementation of refresh in hls-graph:

refresh db stack key result = case (addStack key stack, result) of
(Left e, _) -> throw e
(Right stack, Just me@Result{resultDeps = ResultDeps (toListKeySet -> deps)}) -> do
res <- builder db stack deps

It seems like it speculatively builds all the previous dependencies of the key even if they might have changed. This is wrong as rules can have side effects or require preconditions (GetModIfaceFromDiskAndIndex has both). This also can lead to spurious diagnostics from these pseudo-dependencies.

We should instead build dependencies in the order that they occurred (which we currently do not maintain), and recompute with RunDependenciesChanged if any of them are dirty, short circuiting the evaluation of later dependencies (since they might not be true dependencies). However, this might have a negative impact on performance.

In the above case, that would mean skipping GetModIfaceFromDiskAndIndex after we discover that IsFOI has changed.

@wz1000 wz1000 added type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc.. status: needs triage labels Dec 23, 2022
@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 23, 2022

@pepeiborra thoughts?

@pepeiborra
Copy link
Collaborator

pepeiborra commented Dec 23, 2022

It seems like it speculatively builds all the previous dependencies of the key even if they might have changed.

It doesn't build the previous dependencies, only recomputes them. This can be a no-op if the dependencies are not dirty.

Either way, this behaviour is by design, to implement early cutoff. That's my understanding at least. And as far as I remember, Shake does the same thing.

This is wrong as rules can have side effects or require preconditions (GetModIfaceFromDiskAndIndex has both). This also can lead to spurious diagnostics from these pseudo-dependencies.

Yup, I can see how the side effects and the spurious diagnostics can be annoying. Maybe GetModIfaceFromDiskAndIndex needs to check whether the file is a FOI.

Not sure what you mean about preconditions?

We should instead build dependencies in the order that they occurred (which we currently do not maintain),

The build system makes no promises around evaluation order, and for good reason since we want maximal parallelism. The only promise is that if rule B depends on rule A the build system guarantees that A will build before B.

and recompute with RunDependenciesChanged if any of them are dirty, short circuiting the evaluation of later dependencies (since they might not be true dependencies). However, this might have a negative impact on performance.

So you want to give up early cutoff? I suspect the performance impact will be massive.

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 23, 2022

It doesn't build the previous dependencies, only recomputes them. This can be a no-op if the dependencies are not dirty.

What is the distinction between 'build' and 'compute' here? It seems like all the code in the GetModIfaceFromDiskAndIndex rule is being executed.

Not sure what you mean about preconditions?

GetModIfaceFromDiskAndIndex is only supposed to be called on non-FOIs.

The build system makes no promises around evaluation order, and for good reason since we want maximal parallelism. The only promise is that if rule B depends on rule A the build system guarantees that A will build before B.

We could add an ordering for dependencies so that dependencies using applicative style or uses can be built in parallel, but monadic bind imposes a sequencing order on them.

So you want to give up early cutoff? I suspect the performance impact will be massive.

I'm not sure why this means we have to give up early cutoff. After recomputing previous dependencies, since the rule has finished execution, we should have the result and be able to do early cutoff using that?

@pepeiborra
Copy link
Collaborator

What is the distinction between 'build' and 'compute' here? It seems like all the code in the GetModIfaceFromDiskAndIndex rule is being executed.

Computing a build target involves checking for dirtiness, computing its dependencies, and then running the rule giving it a RunMode:

  • If the target is not dirty, the rule won't be run
  • If the RunMode is RunDependenciesSame, defineEarlyCutoff will return the last cached value (this is what implements early cutoff)

GetModIfaceFromDiskAndIndex is only supposed to be called on non-FOIs.

This is not a property of the build rule, but an invariant that we want to maintain. It seems difficult to encode this invariant in the build system though

@pepeiborra
Copy link
Collaborator

So the gist of your proposal is to not evaluate all the dependencies in parallel. Instead, evaluate them "only as much as needed", and stop as early as possible, i.e. short-circuit. Is that right?

I am unconvinced that this will prevent running GetModIfaceFromDisk for FOIs. Can you illustrate with an example?

It will also make the build system less performant in subtle ways, specially when combined with ApplicativeDo.

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 26, 2022

So the gist of your proposal is to not evaluate all the dependencies in parallel. Instead, evaluate them "only as much as needed", and stop as early as possible, i.e. short-circuit. Is that right?

Yes.

I also noticed in profiles that hls-graph spawns essentially a new thread for each build target, which seems excessive and probably not very good for performance (see our old friend https://gitlab.haskell.org/ghc/ghc/issues/18224). Of course the exact improvement depends on how many threads are blocked on MVars or IO, versus have actual work that can be performed.

I am unconvinced that this will prevent running GetModIfaceFromDisk for FOIs. Can you illustrate with an example?

At least in the case I mentioned in the ticket body, the second time GetModIface is built, hls-graph would first check if IsFileOfInterest has changed (which it has), and then skip computing the rest of the dependencies.

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 26, 2022

On a project with 3000 modules, it looks like hls-graph spawns and hold on to 31000 TSOs for the duration of the entire build.

@pepeiborra
Copy link
Collaborator

I also noticed in profiles that hls-graph spawns essentially a new thread for each build target, which seems excessive and probably not very good for performance (see our old friend https://gitlab.haskell.org/ghc/ghc/issues/18224). Of course the exact improvement depends on how many threads are blocked on MVars or IO, versus have actual work that can be performed.

This is by design, hls-graph is an evolution of Shake, removing all the unessential including the fancy scheduler, which is replaced by the RTS scheduler.

depends on how many threads are blocked on MVars or IO, versus have actual work that can be performed.

Build target threads block on blackholes, IIRC, see splitIO. Of course those blackholes can be blocking on other concurrency primitives but that's up to the build rule.

On a project with 3000 modules, it looks like hls-graph spawns and hold on to 31000 TSOs for the duration of the entire build.

Sounds about right. Rebuilds after edits should spawn much fewer TSOs though, have you measured that?

@pepeiborra
Copy link
Collaborator

So the gist of your proposal is to not evaluate all the dependencies in parallel. Instead, evaluate them "only as much as needed", and stop as early as possible, i.e. short-circuit. Is that right?

Yes.

This makes sense to me, but I still have concerns about the performance impact. Your suggestion of grouping dependencies and evaluating the group in parallel would go a long way here. So I agree that this would be a good change.

@fendor fendor added performance Issues about memory consumption, responsiveness, etc. status: in discussion Not actionable, because discussion is still ongoing or there's no decision yet and removed status: needs triage labels Feb 22, 2023
@soulomoon
Copy link
Collaborator

soulomoon commented Feb 18, 2024

In summary.
The current approach pros:
1 easy to implement
2 can kick all dependencies concurrently.

Cons:
1 wrong semantic.
2 also performance penalty of running
unnecessary rules which might be computational heavy.(that is why we branch FOI in the first place, some non-FOI only rules would actually be running even the file is opened... as in the issue head)

Some thoughts:
A:

  1. Introduce a small dependency dsl with branching semantic to hls-graph so the dependencies only runs when it meet the preconditions, use it to branch the runs of dependencies.
  2. This actually also gives easy fix to a potential dual problem about missing dependencies(missing since it's parent does not run it the first time) that might not kick it's parent when dirty even if it's precondition is met. This can happen if the branching condition can change without the change of some rule.

B:
Alternatively we rearrange the rules with branching dependencies which might be nasty.

@soulomoon
Copy link
Collaborator

soulomoon commented Feb 20, 2024

I am doing a proof of concept draft in #4087 (comment)
Branching the deps with meta in the rule as the way alwaysrun do, to see if performance is working ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues about memory consumption, responsiveness, etc. status: in discussion Not actionable, because discussion is still ongoing or there's no decision yet type: bug Something isn't right: doesn't work as intended, documentation is missing/outdated, etc..
Projects
None yet
Development

No branches or pull requests

4 participants