Skip to content

Benchmark suite for table unions #694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

recursion-ninja
Copy link
Collaborator

@recursion-ninja recursion-ninja commented Apr 28, 2025

Benchmarks for table unions.

Overview

Here is a summary of the table union benchmark. Note that all function calls are made throught the Database.LSMTree.Simple API.

Phase 1: Setup

The benchmark will setup an initial set of tables to be unioned together during "Phase 2." The number of tables create is user specified via the --tableCount command line option with a default value of 10 tables.

The size of each generated table is the same and is user specified via the --initial-size command line option with a default value of 1_000_000 entries. Each created table has a number of insertions and deletion operations performed on it before being written out to disk as a snapshot. There are 2 * $initial-size insertions performed and $initial-size deletion performed on each table, ensuring that the tables are non-empty and contain approximaptely $initial-size$ entries.

Additionally, the directory in which to isolate the benchmark environment is specified via the --bench-dir command line option, with a default of _union_wp10. The table snapshots are saved here along with the benchmarked measurements from Phase 2.

Phase 2: Measurement

When generating measurements for the table unions, the benchmark will reload the snapshots of the tables generated in Phase 1 from disk. Subsequently, the tables will be "incrementally unioned" together.

Once the tables have been loaded and the union initiated, a series of "lookup batches" will be performed. A lookup batch involves performing a large number key lookups on the incrementally unioned table and then calculating the number of lookup operations per second. The measurement series consists of 200 batch lookups.

First, 50 batch lookups are performed without supplying any credits to the unioned table. This establishes a base-line performance picture. Indices [-50, 0] measure lookups to the unioned table with 100% of the debt remaining.

Subsequently, 100 more batch looukps are performed. Before each of these 100 batch lookups, a fixed number of credits are supplied to the incremental union table. The number of credits supplied remain constant between each batch lookup for the entire series of measurements. The series of measurements allows reasoning able table performance over time as the tables debt decreases (at a uniform rate). The number of credits supplied before each lookup batch is 1% of the total starting debt. After 100 steps, 100% of the debt will be paid off. Indices [1, 100] measure lookups to the unioned table with as the remaining debt decreases.

Finally, 50 concluding batch looukps are performed. Since no debt is remaining, no credits are supplied. Rather these meausrments create a "post-payoff" performance picture. Indices [101, 150] measure lookups to the unioned table with 1% of the debt remaining.

The general benchmark format is as follows:

do
measurements <- LSM.withSession (rootDir gopts) $ \session ->
  withLatencyHandle $ \h -> do
    tables <- forM [ 1 .. tableCount gopts ] $ do
      LSM.openTableFromSnapshot <...>

    LSM.withIncrementalUnions tables $ \table -> do
      -- Before payoff picture
      forM [-50 .. 0] $ \step -> do
        measureLookups $ table <...>

      -- During payoff picture
      forM [1 .. 100] $ \step -> do
        LSM.supplyUnionCredits table credits
        measureLookups $ table <...>

      -- After payoff picture
      forM [101 .. 150] $ \step -> do
        measureLookups $ table <...>

outputResults $ analyze measurements

An informative performance plot of the benchmark measurements is generated and placed in the benchmark's directory.

@recursion-ninja
Copy link
Collaborator Author

recursion-ninja commented Apr 28, 2025

Surprisingly, the performance gets worse when the union table debt reaches 0 (see the red arrow)!

unions-benchmark

Comment on lines 572 to 579
benchmarkIterations
h
(\_ _ -> pure ())
(initialSize gopts)
(batchSize opts)
(batchCount opts)
(seed opts)
table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, this is doing both updates and lookups

@recursion-ninja recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from f5e1248 to 0d8238e Compare April 28, 2025 13:27
@recursion-ninja recursion-ninja marked this pull request as draft April 28, 2025 14:16
output b (V.zip ls (fmap (fmap (const mempty)) results))

-- deletes and inserts
_ <- timeLatency tref $ LSM.updates tbl is
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisdral I think that you are correct, insertion and/or updates are being performed in the benchmark. I'll have to remove this.

@recursion-ninja recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from dc2cc99 to 6617a65 Compare May 4, 2025 20:05
@recursion-ninja
Copy link
Collaborator Author

benchmark-10x1_000_000-SEED_008802b0b5fc1ca2

The benchmarks have been reworked. No insert/delete/updates occur during the measurements; only lookups. The output plot has been re-rendered for clarity. The axes are now labeled with units and depict the aggregated lookup time of each batch. Per the suggestion of @dcoutts, a pre- and post- payoff performance picture is generated along side the performance as credits are supplied and the debt is repaid.

@recursion-ninja recursion-ninja marked this pull request as ready for review May 4, 2025 21:20
@recursion-ninja recursion-ninja changed the title WIP: Benchmark suite for table unions Benchmark suite for table unions May 4, 2025
@recursion-ninja recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch 2 times, most recently from ecaff38 to a121e86 Compare May 5, 2025 18:52
@recursion-ninja recursion-ninja force-pushed the recursion-ninja/benchmark-union-merge branch from 2e40140 to b704902 Compare May 6, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants