-
Notifications
You must be signed in to change notification settings - Fork 710
Attempt to find optimal build plans #2860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I also talked to @grayjay about this before. |
Plan of doing this incrementally:
|
Should we compare solutions found with different goal orderings? The solutions might vary more than those found at one end of a large tree. |
@grayjay Goal reordering is somewhat independent. It can significantly speed up the overall search. It can also change the score of the "first few" solutions, although it will obviously not change the score of the optimal solution. Implementing the machinery to do the scoring is necessary in any case. |
@kosmikus I've started working on this. Here is my design, so far:
I tried two variations on pruning. The first prunes all nodes that exceed the max score, in order to reduce the tree size as much as possible. The second prunes only leaf-level nodes. I thought that might lead to smaller conflict sets in areas of the tree that don't even contain high-penalty solutions. I only did a small amount of testing, but I found one case where the performance was comparable and another where leaf-level pruning ran at least ten times longer. I was disappointed by the difficulty of finding better-scoring solutions to real dependency problems. I tested by first running |
I found one way to produce more solutions in a reasonable amount of time. In the cases that I let finish, the speedup ranged from 2x to 680x for finding solutions that are better than cabal's initial solution. Unfortunately, it's messy. I realized that if a variable only appears in a conflict set because it contributed to a score that was above the maximum, there is no reason to try any other assignments at that level, since the values are sorted in increasing order of weight. I annotated the variables in conflict sets with whether or not the conflict can be resolved by trying a value farther to the right. When the solver calculates a node's conflict set from its children during backjumping, if it finds a child meeting the following requirements, it can simply use the union of the conflict sets of the children it has already traversed:
Even with this improvement, the conflict sets can be very large, and cabal can run "forever" searching for a better solution. Also, this change relies on children not being rearranged between pruning high-penalty nodes and backjumping. @kosmikus Do you think I'm on the right track? I can clean up the code later and then link to it. |
I know next to nothing about this area of the code, but... I wonder if this would be something that could "just"[1] be farmed out to an (industrial-strength) SMT solver by transforming the problem appropriately? [1] Yeah, I know, right...? |
I just wanted to chime in here about using an off-the-shelf SMT solver for this sort of thing. I was recently trying to use Z3 (an industrial-strength solver if anything is) and I found that, in the event a solution cannot be found, it tends to generate rather large conflict sets full of terms that actually aren't relevant to the conflict. I think this would have a detrimental effect on our ability to generate good error messages. |
Oh, that's very interesting. Suggestion retracted. Though, perhaps an SMT solver could also be applied to minimize the conflict set? :) |
@grayjay Thanks for working on this. I'd like to see the code if possible. Yes, don't feel too bad if there's no immediate improvement. This is one of several things I'd like to do that I think if used together might have a chance of significantly improving overall speed and quality. The other main ingredient being dynamic goal reordering. In many or even most situations where we do find a solution, in particular if we find it quickly, the first solution is likely to be pretty much near the optimum already. So you should indeed not be surprised if you do not find significantly better ones. The problem is that we cannot guarantee it's the global optimum, so it is good to have at least the option to improve things. The whole stuff about conflict sets is very subtle indeed, and I'm not sure if I understand your descriptions. It would be good to be able to try myself and look at the code. The approach you took even without the extra work you put in already sounds very close to what I had in mind, so it would be good to have it available, and if there's no big slowdown in general, we could plausibly merge it. We should however also have a close look at the actual weights you're assigning. Thanks so far. |
@BardurArantsson Using an SMT solver or similar as a solver replacement has been suggested many times. It's a valid suggestion, and certainly very interesting. Together with a student and with input from @hvr, I have developed an almost usable z3-based solver that can in principle be used as a drop-in replacement for the current modular solver. I'm also currently looking at CUDF solvers. There are significant tradeoffs though, as also @ttuegel mentioned, error messages being one of them (not that the one of the modular solver are generally good). It's not a silver bullet (yet), I'm afraid. The notion of scoring and measuring the quality of different install plans is very relevant even for e.g. SMT solvers. An SMT solver is very good at finding a solution to a constraint problem. It's much less good in my experience so far in finding a good solution to such a problem. There's no need to "retract" the suggestion. It remains valid. I'd ask to continue discussion about completely different solver approaches in different tickets though. |
Yeah, I was somewhat aware (but only peripherally) about the limitations of SMT solvers, but I just didn't stop to think about error messages. Error messages are probably in the HUMAN-Hard set... and thus not yet in the realm of SMT solvers. :) |
@BardurArantsson Well, error messages alone are also not a reason to dismiss SMT solvers completely. It's entirely conceivable that once we know a problem is unsolvable, we use a completely different algorithm to come up with a good explanation for the "why". |
@kosmikus I haven't had a chance to clean up my code yet, but here is the branch, if you want to try it out: https://github.com/grayjay/cabal/tree/optimal-build-plans I'll probably have more time to work on it in the next few days. I actually switched back to ordering nodes the way cabal does currently, so that my branch should find the same solutions when |
@kosmikus I refactored my branch, so it's ready for review: https://github.com/grayjay/cabal/tree/optimal-build-plans . The flag is now The weight type is currently I also added a comment to the commit. |
@kosmikus I just fixed a space leak on my branch. I'm not sure if it's still using too much memory, though. During one test with GHC 7.10.2, the memory usage on optimal-build-plans increased at a rate that was 20% higher than on master. With GHC 7.6.3, the memory usage was almost twice as high as master. This branch also runs 2-30% slower on different tests. I'm hoping that switching the weight type to |
I looked into the solver's memory use and I think I found the problem. There are several space leaks in the existing logging code and another in backjumping. The backjumping bug is present on master and my branch, but my change probably increased the size of the data structures that are retained. The solution looks relatively difficult. I'll try to create a separate pull request or issue when I have time. |
Sorry for the long silence. I really appreciate all your work on this, but there's unfortunately no chance I'll have the time to take a serious look at this before next week. I hope I can get half a day free next week to try to work on my cabal-install backlog. |
@kosmikus No problem. I'll also try to work on the remaining space leak. |
On using SAT/SMT: has anybody already tried using (weighted) MaxSAT to find a good solution? MaxSAT seems on vogue and I've seen some impressive applications. In particular, one could give (different) non-zero weight to flags (for defaulting them), installed packages, fixed constraints and any preference. Of course tuning weights and testing this is nontrivial work, and errors remain a problem. Sorry for commenting here, but this issue seems the most complete SAT/SMT discussion. |
This includes a weight penalty system to measure the quality of different build plans. The preferred versions ranges from hackage should be taken into account for quality of each build plan.
/cc @kosmikus
The text was updated successfully, but these errors were encountered: