Skip to content

In the prototype, merge the union level into regular levels #709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mheinzel
Copy link
Collaborator

@mheinzel mheinzel commented May 7, 2025

Description

This leads to the union level being merged with runs from the regular levels at some point.

Comment on lines 375 to +383
MergePolicyLevelling -> do
case (ir, mrs) of
-- A single incoming run (which thus didn't need merging) must be
-- of the expected size range already
-- of the expected size range already, but it could also be smaller
-- if it comes from a union level.
(Single r, m) -> do
assertST $ case m of CompletedMerge{} -> True
OngoingMerge{} -> False
assertST $ levellingRunSizeToLevel r == ln
assertST $ levellingRunSizeToLevel r <= ln
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make the check a little bit stricter if we distinguish between a Single run and a MigratedUnion run. In the former case we can then check equality, and for the latter case check inequality.

It might be nice to make that distinction anyway. It does not come at much of a cost, I think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires an extra case in a few pattern matches, but that's not really an issue. Stricter invariants are probably worth it.

Comment on lines +798 to +799
-- Before adding the run to the regular levels, we check if we can get
-- rid of the union level (by moving it into into the regular ones).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this is the approach where we try to put the completed union run into the levels whenever we flush a write buffer. What were the other alternatives? Could this not be implemented as part of supplyUnionCredits?

Copy link
Collaborator Author

@mheinzel mheinzel May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it when flushing the buffer seems good, because this is where it really matters that the union level can be moved to the regular ones (when creating new last level merges). A union could get completed by an operation on another table due to sharing. Then it wouldn't get moved until you call supplyUnionCredits, which might never happen again.

I'll explain the alternatives and my reasoning in the comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, now I remember. So one such situation would be where we create an incremental union table, duplicate it, and then only supply credits to one of them. Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could be due to a duplicate or also using the table as an input to another union.

Would it work if the other duplicate (which did not supply credits) always has 1 credit remaining for the "moving back into the levels'?

That would help make it clear that supplyUnionCredits still should be called, but it still doesn't guarantee it.

Comment on lines +1182 to +1185
-- Our representation doesn't allow for empty levels, so we can only put the
-- run directly after the pre-existing regular levels. If it is too large for
-- that, we don't want to move it yet to avoid violating run size invariants
-- and doing inefficient merges of runs with very different sizes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make a TODO to allow empty levels? Or maybe having the Single vs. MigratedUnion distinction would help with thiss?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised I originally wanted to do something even simpler: When the union is completed, always move it to a new level. If it is much larger than that level should be, the existing code will already handle, not creating a merge with it, but just pushing the oversized run down the levels over time, until it fits in and becomes part of a new last level merge. I think combined with the MigratedUnion constructor (to avoid watering down the invariant too much), that could be a decent solution. Kind of like allowing empty levels just before a MigratedUnion, but not explicitly representing them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, maybe we should have a dedicated test that triggers this particular behaviour so that we can check that it works correctly

-- nothing to do
return (ls, NoUnion)
migrateUnionLevel _tr _sc ls ul@(Union t _) =
-- TODO: tracing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you still want to do in this PR?

Comment on lines +1186 to +1192
migrateUnionLevel :: forall s. Tracer (ST s) Event
-> Counter -> Levels s -> UnionLevel s
-> ST s (Levels s, UnionLevel s)
migrateUnionLevel _ _ ls NoUnion = do
-- nothing to do
return (ls, NoUnion)
migrateUnionLevel _tr _sc ls ul@(Union t _) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the Counter argument is unused in both definitions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I still added it, because it will also be used for tracing.

@@ -176,29 +177,72 @@ test_merge_again_with_incoming =
-- properties
--

-- TODO: also generate nested unions?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nesting at least once would potentially show some edge case behaviour

Comment on lines +196 to +197
[ QC.counterexample "debt" $ debt =/= UnionDebt 0
, QC.counterexample "debt'" $ debt' === UnionDebt 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "debt a priori" and "debt a posteriori"?

-- merge is completed and sufficient new entries have been inserted.
prop_union_merge_into_levels :: [[(LSM.Key, LSM.Op)]] -> Property
prop_union_merge_into_levels kopss = length (filter (not . null) kopss) > 1 QC.==>
QC.forAll arbitrary $ \firstPay ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be intentional, but QC.forAll does not shrink, you'd have to use QC.forAllShrink

Comment on lines +216 to +221
-- pay off the union and insert enough that it fits into
-- the last level
let payOffDebt = do
UnionDebt d <- LSM.remainingUnionDebt t
_ <- LSM.supplyUnionCredits t (UnionCredits d)
return ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not performing inserts, but the comment says it does

Comment on lines +223 to +228
-- insert as many new entries as there are in the completed
-- union level
let fillTable = do
unionRunSize <- length <$> LSM.logicalValue t
LSM.inserts tr t
[(K k, V 0, Nothing) | k <- [1 .. unionRunSize]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need these inserts for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure the completed union fits into the last level. Otherwise it currently won't be migrated, although that could change with #709 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants