-
Notifications
You must be signed in to change notification settings - Fork 7
In the prototype, merge the union level into regular levels #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MergePolicyLevelling -> do | ||
case (ir, mrs) of | ||
-- A single incoming run (which thus didn't need merging) must be | ||
-- of the expected size range already | ||
-- of the expected size range already, but it could also be smaller | ||
-- if it comes from a union level. | ||
(Single r, m) -> do | ||
assertST $ case m of CompletedMerge{} -> True | ||
OngoingMerge{} -> False | ||
assertST $ levellingRunSizeToLevel r == ln | ||
assertST $ levellingRunSizeToLevel r <= ln |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make the check a little bit stricter if we distinguish between a Single
run and a MigratedUnion
run. In the former case we can then check equality, and for the latter case check inequality.
It might be nice to make that distinction anyway. It does not come at much of a cost, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires an extra case in a few pattern matches, but that's not really an issue. Stricter invariants are probably worth it.
-- Before adding the run to the regular levels, we check if we can get | ||
-- rid of the union level (by moving it into into the regular ones). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so this is the approach where we try to put the completed union run into the levels whenever we flush a write buffer. What were the other alternatives? Could this not be implemented as part of supplyUnionCredits
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it when flushing the buffer seems good, because this is where it really matters that the union level can be moved to the regular ones (when creating new last level merges). A union could get completed by an operation on another table due to sharing. Then it wouldn't get moved until you call supplyUnionCredits
, which might never happen again.
I'll explain the alternatives and my reasoning in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, now I remember. So one such situation would be where we create an incremental union table, duplicate it, and then only supply credits to one of them. Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, could be due to a duplicate or also using the table as an input to another union.
Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?
That would help make it clear that supplyUnionCredits
still should be called, but it still doesn't guarantee it.
-- Our representation doesn't allow for empty levels, so we can only put the | ||
-- run directly after the pre-existing regular levels. If it is too large for | ||
-- that, we don't want to move it yet to avoid violating run size invariants | ||
-- and doing inefficient merges of runs with very different sizes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make a TODO to allow empty levels? Or maybe having the Single
vs. MigratedUnion
distinction would help with thiss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realised I originally wanted to do something even simpler: When the union is completed, always move it to a new level. If it is much larger than that level should be, the existing code will already handle, not creating a merge with it, but just pushing the oversized run down the levels over time, until it fits in and becomes part of a new last level merge. I think combined with the MigratedUnion
constructor (to avoid watering down the invariant too much), that could be a decent solution. Kind of like allowing empty levels just before a MigratedUnion
, but not explicitly representing them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, maybe we should have a dedicated test that triggers this particular behaviour so that we can check that it works correctly
-- nothing to do | ||
return (ls, NoUnion) | ||
migrateUnionLevel _tr _sc ls ul@(Union t _) = | ||
-- TODO: tracing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something you still want to do in this PR?
migrateUnionLevel :: forall s. Tracer (ST s) Event | ||
-> Counter -> Levels s -> UnionLevel s | ||
-> ST s (Levels s, UnionLevel s) | ||
migrateUnionLevel _ _ ls NoUnion = do | ||
-- nothing to do | ||
return (ls, NoUnion) | ||
migrateUnionLevel _tr _sc ls ul@(Union t _) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the Counter
argument is unused in both definitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I still added it, because it will also be used for tracing.
@@ -176,29 +177,72 @@ test_merge_again_with_incoming = | |||
-- properties | |||
-- | |||
|
|||
-- TODO: also generate nested unions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nesting at least once would potentially show some edge case behaviour
[ QC.counterexample "debt" $ debt =/= UnionDebt 0 | ||
, QC.counterexample "debt'" $ debt' === UnionDebt 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "debt a priori"
and "debt a posteriori"
?
-- merge is completed and sufficient new entries have been inserted. | ||
prop_union_merge_into_levels :: [[(LSM.Key, LSM.Op)]] -> Property | ||
prop_union_merge_into_levels kopss = length (filter (not . null) kopss) > 1 QC.==> | ||
QC.forAll arbitrary $ \firstPay -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be intentional, but QC.forAll
does not shrink, you'd have to use QC.forAllShrink
-- pay off the union and insert enough that it fits into | ||
-- the last level | ||
let payOffDebt = do | ||
UnionDebt d <- LSM.remainingUnionDebt t | ||
_ <- LSM.supplyUnionCredits t (UnionCredits d) | ||
return () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not performing inserts, but the comment says it does
-- insert as many new entries as there are in the completed | ||
-- union level | ||
let fillTable = do | ||
unionRunSize <- length <$> LSM.logicalValue t | ||
LSM.inserts tr t | ||
[(K k, V 0, Nothing) | k <- [1 .. unionRunSize]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we need these inserts for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure the completed union fits into the last level. Otherwise it currently won't be migrated, although that could change with #709 (comment).
Description
This leads to the union level being merged with runs from the regular levels at some point.