Skip to content

Final report #773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Final report #773

wants to merge 21 commits into from

Conversation

dcoutts
Copy link
Collaborator

@dcoutts dcoutts commented Jul 1, 2025

The final write-up for the project.

We should probably also include .pdfs of some of the references (the preceding reports), and it'd probably be helpful to include a generated pdf of the final report and the integration notes.

@jorisdral jorisdral force-pushed the dcoutts/final-report branch from e23d0e6 to 09e573a Compare July 1, 2025 10:45
@jeltsch jeltsch force-pushed the dcoutts/final-report branch from ba1d4ed to a4b291e Compare July 1, 2025 15:34
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! I have a bunch of comments but mostly just clarifications

large amount of noise.

Overall, our judgement is that this mitigation is practically sufficient, but
it merits a securit review from others who may make a different judgement. It
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"securit" -> "security"

Comment on lines +186 to +189
Note that a per-run or per-table hash salt would incur non-trivial costs,
because it would reduce the sharing available in bulk Bloom filter lookups
(looking up N keys in M filters). The Bloom filter lookup is a performance
sensitive part of the overall database implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it extra clear

Suggested change
Note that a per-run or per-table hash salt would incur non-trivial costs,
because it would reduce the sharing available in bulk Bloom filter lookups
(looking up N keys in M filters). The Bloom filter lookup is a performance
sensitive part of the overall database implementation.
Note that a per-run or per-table hash salt would incur non-trivial costs,
because it would reduce the sharing available in bulk Bloom filter lookups
(looking up N keys in M filters). Hence, we store a hash salt on a per-session basis. The Bloom filter lookup is a performance
sensitive part of the overall database implementation.

issued: 2021
title: "Storing the Cardano ledger state on disk: analysis and design options"
type: report
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db.pdf
URL: https://github.com/IntersectMBO/lsm-tree/blob/main/doc/final-report/references/utxo-db.pdf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I spotted this as well, independently from you. Fixed already meanwhile. 🙂

issued: 2021
title: "Storing the Cardano ledger state on disk: API design concepts"
type: report
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db-api.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db-api.pdf
URL: https://github.com/IntersectMBO/lsm-tree/blob/main/doc/final-report/references/utxo-db-api.pdf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

issued: 2023
title: "Storing the Cardano ledger state on disk: requirements for high performance backend"
type: report
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db-lsm.pdf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
URL: https://github.com/IntersectMBO/lsm-tree/blob/master/doc/final-report/references/utxo-db-lsm.pdf
URL: https://github.com/IntersectMBO/lsm-tree/blob/main/doc/final-report/references/utxo-db-lsm.pdf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment on lines 1388 to 1389
These benchmarks are implemented using `criterion`, which performs multiple
runs and combines the results in a sound statistical manner. The reported
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include an invocation for readers to run the benchmark themselves?

Comment on lines +1248 to +1255
* The i7i.xlarge machine has excellent performance with one core, exceeding the
100 k target already in this setting. We know it has higher one-core
performance than i8g.xlarge, with an advantage of approximately 40 % in the
Bloom filter micro-benchmark. Nevertheless, it is probably limited by CPU,
not by SSD, since its one-core IOPS value is so high (350 k). We know its
IOPS value scales negatively when going to two cores, down to 210 k
aggregated across both cores. This is probably the cause of its poor speedup:
the machine goes from being limited by CPU to being limited by SSD.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence confused me. It's not clear to me from reading the rest of the report that we get worse performance when we going from 1 core to 2 cores because of SSD limits. That seems to be a new claim that we didn't really talk about in section 4.3. For reference, there we say:

The IOPS scores scale negatively when adding more cores. This is actually quite bizarre: the
more cores used to submit I/O, the lower the aggregate IOPS that is measured. Apparently,
this is not the result of a measurement artefact but shows a real effect, and it is opposite
to what happens with physical hardware. Running fio on the i8g.xlarge machine with
4 cores results in 175 k IOPS (which is near to the rated 150 k IOPS), showing that the
negative scaling continues beyond two cores.

This only says that IOPS go down if we add more cores, but not why we think that happened.

Comment on lines +1340 to +1343
* During the *run* phase, the actual benchmark is performed based on the
database created during the setup phase. The user can decide on the number of
batches, the disk cache policy and the benchmark mode: serial (the default),
pipelined or lookup only.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we there are more things that are configurable. Maybe we should say: "amongst other options, the user can decide on ..."

BTW, lookup-only and serial vs. pipelined can be set independently, but the code ignores lookup-only in the pipelined case. We should maybe make these three modes part of the same option, instead of separate flags

ops/sec, much more than the circa 86 k ops/sec we get with the 100 M entry
table. The reason is that for smaller tables there is less merging work to do.

### Reproducing the results
Copy link
Collaborator

@jorisdral jorisdral Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a commit / tag that we used for our benchmarks, like the alpha tag for the functional requirements? That would help with reproducibility, and we can tell users to checkout that commit/tag if they want to run the benchmarks themselves

Should we say lsm-tree-0.1.0.0 once it is released?

reasonable running times, we use 10 k batches of operations for the serial mode
and 100 k batches for the pipelined mode.

### Serial and pipelined execution
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I were reading this for the first time without prior knowledge about pipelining, I think I would understand the general idea of overlapping operations, but I fear I wouldn't fully understand the design as it is realised in our benchmark because the current description is still rather high-level. Maybe we can direct readers to the utxo-db-api document, which has a more formal explanation of pipelining and the dataflow, that might be helpful? Or maybe it's fine if the user has only a general intuition about pipelining, but then it might be helpful still to refer to the more formal description in the utxo-db-api doc


* 64 bit as the size of keys and values
* 80,000 elements (generated using a PRNG)
* Addition as the update operation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"update" is too overloaded a term. Combined or resolved is better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants