@kosmikus I've started working on this. Here is my design, so far:

I created a type newtype WeightedPSQ w k v = WeightedPSQ [(w, k, v))] with the invariant that the list is always sorted by weight. All choice nodes use WeightedPSQ, except for goal choices. GoalChoice still uses PSQ because goal order shouldn't affect the score of the install plan.
The weight type is a positive Double. Smaller values are better, with 0 being perfect. Each preferences-related traversal adds to the existing weight in the WeightedPSQ. Most traversals leave the weight of at least one child of each node unchanged. For example, in preferLinked, the solver adds 1 to non-linked packages, but only when they have at least one linked sibling. I'm sure I'll have to spend a lot of time adjusting the rules. I first tried [Double] as the weight type to ensure that I could reproduce cabal's current behavior. Then I switched to Double so that I could weight the different preferences more evenly.
A commandline flag, --max-score, exposes the feature. I haven't looked into any ways to automatically restrict the score.
I added a tree traversal, maxScorePhase, between heuristicsPhase and explorePhase. The traversal keeps track of the sum of the weights of the preceding nodes and prunes nodes where the sum exceeds the maximum. maxScorePhase also stores the total score on Done nodes.
When a node exceeds the max score, the conflict set is the union of all variables that contributed non-zero weight, and each of their QGoalReasonChains.

I tried two variations on pruning. The first prunes all nodes that exceed the max score, in order to reduce the tree size as much as possible. The second prunes only leaf-level nodes. I thought that might lead to smaller conflict sets in areas of the tree that don't even contain high-penalty solutions. I only did a small amount of testing, but I found one case where the performance was comparable and another where leaf-level pruning ran at least ten times longer.

I was disappointed by the difficulty of finding better-scoring solutions to real dependency problems. I tested by first running cabal install with a package from Hackage without a max score. Then I reran cabal with a score that was slightly lower than the previous solution's score. Usually, cabal was still running after several minutes, so I stopped it. I was eventually able to find two packages where my branch found a better-scoring install plan in a reasonable amount of time, hackage-server and yi. In both cases, finding the second solution took an order of magnitude more time than the first solution. Working on this issue made me realize that returning the first solution after searching packages in order of preference is usually pretty effective. Maybe looking beyond the first solution is mainly useful as a fallback for cases where the user is unhappy with the install plan. It's also possible that my implementation has a bug, or that changing the scoring or other heuristics would significantly improve the usefulness of the feature.

Attempt to find optimal build plans #2860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions