Skip to content

Commit 5bd21fc

Browse files
wenkokkejorisdral
andcommitted
fix(docs): process feedback on #701
Co-authored-by: Joris Dral <[email protected]>
1 parent 3bd84fc commit 5bd21fc

File tree

5 files changed

+36
-33
lines changed

5 files changed

+36
-33
lines changed

README.md

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -353,24 +353,24 @@ uniformly distributed, e.g., when the keys are hashes.
353353

354354
`confDiskCachePolicy`
355355
The *disk cache policy* determines if lookup operations use the OS page
356-
cache. Caching may improve the performance of lookups if database access
357-
follows certain patterns.
356+
cache. Caching may improve the performance of lookups and updates if
357+
database access follows certain patterns.
358358

359359
##### Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size <span id="fine_tuning_data_layout" class="anchor"></span>
360360

361361
The configuration parameters `confMergePolicy`, `confSizeRatio`, and
362362
`confWriteBufferAlloc` affect how the table organises its data. To
363363
understand what effect these parameters have, one must have a basic
364-
understand of how an LSM-tree stores its data. The physical entries in
365-
an LSM-tree are key–operation pairs, which pair a key with an operation
366-
such as an `Insert` with a value or a `Delete`. These key–operation
367-
pairs are organised into *runs*, which are sequences of key–operation
368-
pairs sorted by their key. Runs are organised into *levels*, which are
369-
unordered sequences or runs. Levels are organised hierarchically. Level
370-
0 is kept in memory, and is referred to as the *write buffer*. All
371-
subsequent levels are stored on disk, with each run stored in its own
372-
file. The following shows an example LSM-tree layout, with each run as a
373-
boxed sequence of keys and each level as a row.
364+
understanding of how an LSM-tree stores its data. The physical entries
365+
in an LSM-tree are key–operation pairs, which pair a key with an
366+
operation such as an `Insert` with a value or a `Delete`. These
367+
key–operation pairs are organised into *runs*, which are sequences of
368+
key–operation pairs sorted by their key. Runs are organised into
369+
*levels*, which are unordered sequences or runs. Levels are organised
370+
hierarchically. Level 0 is kept in memory, and is referred to as the
371+
*write buffer*. All subsequent levels are stored on disk, with each run
372+
stored in its own file. The following shows an example LSM-tree layout,
373+
with each run as a boxed sequence of keys and each level as a row.
374374

375375
``` math
376376
@@ -527,15 +527,16 @@ in-memory size of the table.
527527

528528
Tables maintain a [Bloom
529529
filter](https://en.wikipedia.org/wiki/Bloom_filter "https://en.wikipedia.org/wiki/Bloom_filter")
530-
in memory for each run on disk. These Bloom filter are probablilistic
530+
in memory for each run on disk. These Bloom filters are probablilistic
531531
datastructure that are used to track which keys are present in their
532532
corresponding run. Querying a Bloom filter returns either "maybe"
533533
meaning the key is possibly in the run or "no" meaning the key is
534534
definitely not in the run. When a query returns "maybe" while the key is
535535
*not* in the run, this is referred to as a *false positive*. While the
536536
database executes a lookup operation, any Bloom filter query that
537-
returns a false positive causes the database to unnecessarily read a run
538-
from disk. The probabliliy of these spurious reads follow a [binomial
537+
returns a false positive causes the database to unnecessarily read a
538+
page from disk. The probabliliy of these spurious reads follow a
539+
[binomial
539540
distribution](https://en.wikipedia.org/wiki/Binomial_distribution "https://en.wikipedia.org/wiki/Binomial_distribution")
540541
$`\text{Binomial}(r,\text{FPR})`$ where $`r`$ refers to the number of
541542
runs and $`\text{FPR}`$ refers to the false-positive rate of the Bloom
@@ -601,9 +602,10 @@ types of fence-pointer indexes:
601602
`OrdinaryIndex`
602603
Ordinary indexes are designed for any use case.
603604

604-
Ordinary indexes store one serialised key per page of memory. The total
605-
in-memory size of all indexes is $`K \cdot \frac{n}{P}`$ bits, where
606-
$`K`$ refers to the average size of a serialised key in bits.
605+
Ordinary indexes store one serialised key per page of memory. The
606+
average total in-memory size of all indexes is $`K \cdot \frac{n}{P}`$
607+
bits, where $`K`$ refers to the average size of a serialised key in
608+
bits.
607609

608610
`CompactIndex`
609611
Compact indexes are designed for the use case where the keys in the
@@ -614,8 +616,8 @@ serialised key of each page of memory. This requires that serialised
614616
keys are *at least* 64 bits in size. Compact indexes store 1 additional
615617
bit per page of memory to resolve collisions, 1 additional bit per page
616618
of memory to mark entries that are larger than one page, and a
617-
negligible amount of memory for tie breakers. The total in-memory size
618-
of all indexes is $`66 \cdot \frac{n}{P}`$ bits.
619+
negligible amount of memory for tie breakers. The average total
620+
in-memory size of all indexes is $`66 \cdot \frac{n}{P}`$ bits.
619621

620622
##### Fine-tuning: Disk Cache Policy <span id="fine_tuning_disk_cache_policy" class="anchor"></span>
621623

lsm-tree.cabal

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -181,12 +181,12 @@ description:
181181

182182
[@confDiskCachePolicy@]
183183
The /disk cache policy/ determines if lookup operations use the OS page cache.
184-
Caching may improve the performance of lookups if database access follows certain patterns.
184+
Caching may improve the performance of lookups and updates if database access follows certain patterns.
185185

186186
==== Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size #fine_tuning_data_layout#
187187

188188
The configuration parameters @confMergePolicy@, @confSizeRatio@, and @confWriteBufferAlloc@ affect how the table organises its data.
189-
To understand what effect these parameters have, one must have a basic understand of how an LSM-tree stores its data.
189+
To understand what effect these parameters have, one must have a basic understanding of how an LSM-tree stores its data.
190190
The physical entries in an LSM-tree are key–operation pairs, which pair a key with an operation such as an @Insert@ with a value or a @Delete@.
191191
These key–operation pairs are organised into /runs/, which are sequences of key–operation pairs sorted by their key.
192192
Runs are organised into /levels/, which are unordered sequences or runs.
@@ -331,10 +331,10 @@ description:
331331
which balances the performance of lookups against the in-memory size of the table.
332332

333333
Tables maintain a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in memory for each run on disk.
334-
These Bloom filter are probablilistic datastructure that are used to track which keys are present in their corresponding run.
334+
These Bloom filters are probablilistic datastructures that are used to track which keys are present in their corresponding run.
335335
Querying a Bloom filter returns either \"maybe\" meaning the key is possibly in the run or \"no\" meaning the key is definitely not in the run.
336336
When a query returns \"maybe\" while the key is /not/ in the run, this is referred to as a /false positive/.
337-
While the database executes a lookup operation, any Bloom filter query that returns a false positive causes the database to unnecessarily read a run from disk.
337+
While the database executes a lookup operation, any Bloom filter query that returns a false positive causes the database to unnecessarily read a page from disk.
338338
The probabliliy of these spurious reads follow a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) \(\text{Binomial}(r,\text{FPR})\)
339339
where \(r\) refers to the number of runs and \(\text{FPR}\) refers to the false-positive rate of the Bloom filters.
340340
Hence, the expected number of spurious reads for each lookup operation is \(r\cdot\text{FPR}\).
@@ -401,7 +401,7 @@ description:
401401
Ordinary indexes are designed for any use case.
402402

403403
Ordinary indexes store one serialised key per page of memory.
404-
The total in-memory size of all indexes is \(K \cdot \frac{n}{P}\) bits,
404+
The average total in-memory size of all indexes is \(K \cdot \frac{n}{P}\) bits,
405405
where \(K\) refers to the average size of a serialised key in bits.
406406

407407
[@CompactIndex@]
@@ -410,7 +410,7 @@ description:
410410
Compact indexes store the 64 most significant bits of the minimum serialised key of each page of memory.
411411
This requires that serialised keys are /at least/ 64 bits in size.
412412
Compact indexes store 1 additional bit per page of memory to resolve collisions, 1 additional bit per page of memory to mark entries that are larger than one page, and a negligible amount of memory for tie breakers.
413-
The total in-memory size of all indexes is \(66 \cdot \frac{n}{P}\) bits.
413+
The average total in-memory size of all indexes is \(66 \cdot \frac{n}{P}\) bits.
414414

415415
==== Fine-tuning: Disk Cache Policy #fine_tuning_disk_cache_policy#
416416

src/Database/LSMTree/Internal/Config.hs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,11 @@ For a detailed discussion of fine-tuning the table configuration, see [Fine-tuni
7272
7373
[@confMergeSchedule :: t'MergeSchedule'@]
7474
The /merge schedule/ balances the performance of lookups and updates against the consistency of updates.
75-
The merge schedule does not affect the performance of table unions.
7675
With the one-shot merge schedule, lookups and updates are more efficient overall, but some updates may take much longer than others.
7776
With the incremental merge schedule, lookups and updates are less efficient overall, but each update does a similar amount of work.
7877
This parameter is explicitly referenced in the documentation of those operations it affects.
78+
The merge schedule does not affect the way that table unions are computed.
79+
However, any table union must complete all outstanding incremental updates.
7980
8081
[@confBloomFilterAlloc :: t'BloomFilterAlloc'@]
8182
The Bloom filter size balances the performance of lookups against the in-memory size of the database.
@@ -88,7 +89,7 @@ For a detailed discussion of fine-tuning the table configuration, see [Fine-tuni
8889
8990
[@confDiskCachePolicy :: t'DiskCachePolicy'@]
9091
The /disk cache policy/ supports caching lookup operations using the OS page cache.
91-
Caching may improve the performance of lookups if database access follows certain patterns.
92+
Caching may improve the performance of lookups and updates if database access follows certain patterns.
9293
-}
9394
data TableConfig = TableConfig {
9495
confMergePolicy :: !MergePolicy

src/Database/LSMTree/Internal/Range.hs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,11 @@ import Control.DeepSeq (NFData (..))
1414
-- | A range of keys.
1515
data Range k =
1616
{- |
17-
@'FromToExcluding' i j@ is the ranges from @i@ (inclusive) to @j@ (exclusive).
17+
@'FromToExcluding' i j@ is the range from @i@ (inclusive) to @j@ (exclusive).
1818
-}
1919
FromToExcluding k k
2020
{- |
21-
@'FromToIncluding' i j@ is the ranges from @i@ (inclusive) to @j@ (inclusive).
21+
@'FromToIncluding' i j@ is the range from @i@ (inclusive) to @j@ (inclusive).
2222
-}
2323
| FromToIncluding k k
2424
deriving stock (Show, Eq, Functor)

src/Database/LSMTree/Internal/Serialise/Class.hs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -411,7 +411,7 @@ instance SerialiseKeyOrderPreserving String
411411
412412
@'deserialiseKey'@: \(O(n)\).
413413
414-
The 'String' is (de)serialiseValue as UTF-8.
414+
The 'String' is (de)serialised as UTF-8.
415415
-}
416416
instance SerialiseValue String where
417417
-- TODO: Optimise. The performance is \(O(n) + O(n)\) but it could be \(O(n)\).
@@ -513,7 +513,7 @@ instance SerialiseValue P.ByteArray where
513513
{- |
514514
This instance is intended for tables without blobs.
515515
516-
The implementation of 'deseriValue' throws an excepValuen.
516+
The implementation of @'deserialiseValue'@ throws an excepValuen.
517517
-}
518518
instance SerialiseValue Void where
519519
serialiseValue = absurd
@@ -526,7 +526,7 @@ instance SerialiseValue Void where
526526
{- |
527527
An instance for 'Sum' which is transparent to the serialisation of the value type.
528528
529-
__NOTE:__ If you want to seriValue @'Sum' a@ differValuely from @a@, you must use another newtype wrapper.
529+
__NOTE:__ If you want to serialise @'Sum' a@ differently from @a@, you must use another newtype wrapper.
530530
-}
531531
instance SerialiseValue a => SerialiseValue (Sum a) where
532532
serialiseValue (Sum v) = serialiseValue v

0 commit comments

Comments
 (0)