fix(docs): process feedback on #701

wenkokke · jorisdral · wenkokke · commit 5bd21fc740f8 · 2025-06-04T12:34:42.000+01:00
Co-authored-by: Joris Dral &lt;joris@well-typed.com&gt;
diff --git a/README.md b/README.md
@@ -353,24 +353,24 @@ uniformly distributed, e.g., when the keys are hashes.
 
 `confDiskCachePolicy`  
 The *disk cache policy* determines if lookup operations use the OS page
-cache. Caching may improve the performance of lookups if database access
-follows certain patterns.
+cache. Caching may improve the performance of lookups and updates if
+database access follows certain patterns.
 
 ##### Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size <span id="fine_tuning_data_layout" class="anchor"></span>
 
 The configuration parameters `confMergePolicy`, `confSizeRatio`, and
 `confWriteBufferAlloc` affect how the table organises its data. To
 understand what effect these parameters have, one must have a basic
-understand of how an LSM-tree stores its data. The physical entries in
-an LSM-tree are key–operation pairs, which pair a key with an operation
-such as an `Insert` with a value or a `Delete`. These key–operation
-pairs are organised into *runs*, which are sequences of key–operation
-pairs sorted by their key. Runs are organised into *levels*, which are
-unordered sequences or runs. Levels are organised hierarchically. Level
-0 is kept in memory, and is referred to as the *write buffer*. All
-subsequent levels are stored on disk, with each run stored in its own
-file. The following shows an example LSM-tree layout, with each run as a
-boxed sequence of keys and each level as a row.
+understanding of how an LSM-tree stores its data. The physical entries
+in an LSM-tree are key–operation pairs, which pair a key with an
+operation such as an `Insert` with a value or a `Delete`. These
+key–operation pairs are organised into *runs*, which are sequences of
+key–operation pairs sorted by their key. Runs are organised into
+*levels*, which are unordered sequences or runs. Levels are organised
+hierarchically. Level 0 is kept in memory, and is referred to as the
+*write buffer*. All subsequent levels are stored on disk, with each run
+stored in its own file. The following shows an example LSM-tree layout,
+with each run as a boxed sequence of keys and each level as a row.
 
 ``` math
 
@@ -527,15 +527,16 @@ in-memory size of the table.
 
 Tables maintain a [Bloom
 filter](https://en.wikipedia.org/wiki/Bloom_filter "https://en.wikipedia.org/wiki/Bloom_filter")
-in memory for each run on disk. These Bloom filter are probablilistic
+in memory for each run on disk. These Bloom filters are probablilistic
 datastructure that are used to track which keys are present in their
 corresponding run. Querying a Bloom filter returns either "maybe"
 meaning the key is possibly in the run or "no" meaning the key is
 definitely not in the run. When a query returns "maybe" while the key is
 *not* in the run, this is referred to as a *false positive*. While the
 database executes a lookup operation, any Bloom filter query that
-returns a false positive causes the database to unnecessarily read a run
-from disk. The probabliliy of these spurious reads follow a [binomial
+returns a false positive causes the database to unnecessarily read a
+page from disk. The probabliliy of these spurious reads follow a
+[binomial
 distribution](https://en.wikipedia.org/wiki/Binomial_distribution "https://en.wikipedia.org/wiki/Binomial_distribution")
 $`\text{Binomial}(r,\text{FPR})`$ where $`r`$ refers to the number of
 runs and $`\text{FPR}`$ refers to the false-positive rate of the Bloom
@@ -601,9 +602,10 @@ types of fence-pointer indexes:
 `OrdinaryIndex`  
 Ordinary indexes are designed for any use case.
 
-Ordinary indexes store one serialised key per page of memory. The total
-in-memory size of all indexes is $`K \cdot \frac{n}{P}`$ bits, where
-$`K`$ refers to the average size of a serialised key in bits.
+Ordinary indexes store one serialised key per page of memory. The
+average total in-memory size of all indexes is $`K \cdot \frac{n}{P}`$
+bits, where $`K`$ refers to the average size of a serialised key in
+bits.
 
 `CompactIndex`  
 Compact indexes are designed for the use case where the keys in the
@@ -614,8 +616,8 @@ serialised key of each page of memory. This requires that serialised
 keys are *at least* 64 bits in size. Compact indexes store 1 additional
 bit per page of memory to resolve collisions, 1 additional bit per page
 of memory to mark entries that are larger than one page, and a
-negligible amount of memory for tie breakers. The total in-memory size
-of all indexes is $`66 \cdot \frac{n}{P}`$ bits.
+negligible amount of memory for tie breakers. The average total
+in-memory size of all indexes is $`66 \cdot \frac{n}{P}`$ bits.
 
 ##### Fine-tuning: Disk Cache Policy <span id="fine_tuning_disk_cache_policy" class="anchor"></span>
 
diff --git a/lsm-tree.cabal b/lsm-tree.cabal
@@ -181,12 +181,12 @@ description:
 
   [@confDiskCachePolicy@]
       The /disk cache policy/ determines if lookup operations use the OS page cache.
-      Caching may improve the performance of lookups if database access follows certain patterns.
+      Caching may improve the performance of lookups and updates if database access follows certain patterns.
 
   ==== Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size #fine_tuning_data_layout#
 
   The configuration parameters @confMergePolicy@, @confSizeRatio@, and @confWriteBufferAlloc@ affect how the table organises its data.
-  To understand what effect these parameters have, one must have a basic understand of how an LSM-tree stores its data.
+  To understand what effect these parameters have, one must have a basic understanding of how an LSM-tree stores its data.
   The physical entries in an LSM-tree are key–operation pairs, which pair a key with an operation such as an @Insert@ with a value or a @Delete@.
   These key–operation pairs are organised into /runs/, which are sequences of key–operation pairs sorted by their key.
   Runs are organised into /levels/, which are unordered sequences or runs.
@@ -331,10 +331,10 @@ description:
   which balances the performance of lookups against the in-memory size of the table.
 
   Tables maintain a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in memory for each run on disk.
-  These Bloom filter are probablilistic datastructure that are used to track which keys are present in their corresponding run.
+  These Bloom filters are probablilistic datastructures that are used to track which keys are present in their corresponding run.
   Querying a Bloom filter returns either \"maybe\" meaning the key is possibly in the run or \"no\" meaning the key is definitely not in the run.
   When a query returns \"maybe\" while the key is /not/ in the run, this is referred to as a /false positive/.
-  While the database executes a lookup operation, any Bloom filter query that returns a false positive causes the database to unnecessarily read a run from disk.
+  While the database executes a lookup operation, any Bloom filter query that returns a false positive causes the database to unnecessarily read a page from disk.
   The probabliliy of these spurious reads follow a [binomial distribution](https://en.wikipedia.org/wiki/Binomial_distribution) \(\text{Binomial}(r,\text{FPR})\)
   where \(r\) refers to the number of runs and \(\text{FPR}\) refers to the false-positive rate of the Bloom filters.
   Hence, the expected number of spurious reads for each lookup operation is \(r\cdot\text{FPR}\).
@@ -401,7 +401,7 @@ description:
       Ordinary indexes are designed for any use case.
 
       Ordinary indexes store one serialised key per page of memory.
-      The total in-memory size of all indexes is \(K \cdot \frac{n}{P}\) bits,
+      The average total in-memory size of all indexes is \(K \cdot \frac{n}{P}\) bits,
       where \(K\) refers to the average size of a serialised key in bits.
 
   [@CompactIndex@]
@@ -410,7 +410,7 @@ description:
       Compact indexes store the 64 most significant bits of the minimum serialised key of each page of memory.
       This requires that serialised keys are /at least/ 64 bits in size.
       Compact indexes store 1 additional bit per page of memory to resolve collisions, 1 additional bit per page of memory to mark entries that are larger than one page, and a negligible amount of memory for tie breakers.
-      The total in-memory size of all indexes is \(66 \cdot \frac{n}{P}\) bits.
+      The average total in-memory size of all indexes is \(66 \cdot \frac{n}{P}\) bits.
 
   ==== Fine-tuning: Disk Cache Policy #fine_tuning_disk_cache_policy#
 
diff --git a/src/Database/LSMTree/Internal/Config.hs b/src/Database/LSMTree/Internal/Config.hs
@@ -72,10 +72,11 @@ For a detailed discussion of fine-tuning the table configuration, see [Fine-tuni
 
 [@confMergeSchedule :: t'MergeSchedule'@]
     The /merge schedule/ balances the performance of lookups and updates against the consistency of updates.
-    The merge schedule does not affect the performance of table unions.
     With the one-shot merge schedule, lookups and updates are more efficient overall, but some updates may take much longer than others.
     With the incremental merge schedule, lookups and updates are less efficient overall, but each update does a similar amount of work.
     This parameter is explicitly referenced in the documentation of those operations it affects.
+    The merge schedule does not affect the way that table unions are computed.
+    However, any table union must complete all outstanding incremental updates.
 
 [@confBloomFilterAlloc :: t'BloomFilterAlloc'@]
     The Bloom filter size balances the performance of lookups against the in-memory size of the database.
@@ -88,7 +89,7 @@ For a detailed discussion of fine-tuning the table configuration, see [Fine-tuni
 
 [@confDiskCachePolicy :: t'DiskCachePolicy'@]
     The /disk cache policy/ supports caching lookup operations using the OS page cache.
-    Caching may improve the performance of lookups if database access follows certain patterns.
+    Caching may improve the performance of lookups and updates if database access follows certain patterns.
 -}
 data TableConfig = TableConfig {
     confMergePolicy       :: !MergePolicy
diff --git a/src/Database/LSMTree/Internal/Range.hs b/src/Database/LSMTree/Internal/Range.hs
@@ -14,11 +14,11 @@ import           Control.DeepSeq (NFData (..))
 -- | A range of keys.
 data Range k =
     {- |
-    @'FromToExcluding' i j@ is the ranges from @i@ (inclusive) to @j@ (exclusive).
+    @'FromToExcluding' i j@ is the range from @i@ (inclusive) to @j@ (exclusive).
     -}
     FromToExcluding k k
     {- |
-    @'FromToIncluding' i j@ is the ranges from @i@ (inclusive) to @j@ (inclusive).
+    @'FromToIncluding' i j@ is the range from @i@ (inclusive) to @j@ (inclusive).
     -}
   | FromToIncluding k k
   deriving stock (Show, Eq, Functor)
diff --git a/src/Database/LSMTree/Internal/Serialise/Class.hs b/src/Database/LSMTree/Internal/Serialise/Class.hs
@@ -411,7 +411,7 @@ instance SerialiseKeyOrderPreserving String
 
 @'deserialiseKey'@: \(O(n)\).
 
-The 'String' is (de)serialiseValue as UTF-8.
+The 'String' is (de)serialised as UTF-8.
 -}
 instance SerialiseValue String where
   -- TODO: Optimise. The performance is \(O(n) + O(n)\) but it could be \(O(n)\).
@@ -513,7 +513,7 @@ instance SerialiseValue P.ByteArray where
 {- |
 This instance is intended for tables without blobs.
 
-The implementation of 'deseriValue' throws an excepValuen.
+The implementation of @'deserialiseValue'@ throws an excepValuen.
 -}
 instance SerialiseValue Void where
   serialiseValue = absurd
@@ -526,7 +526,7 @@ instance SerialiseValue Void where
 {- |
 An instance for 'Sum' which is transparent to the serialisation of the value type.
 
-__NOTE:__ If you want to seriValue @'Sum' a@ differValuely from @a@, you must use another newtype wrapper.
+__NOTE:__ If you want to serialise @'Sum' a@ differently from @a@, you must use another newtype wrapper.
 -}
 instance SerialiseValue a => SerialiseValue (Sum a) where
   serialiseValue (Sum v) = serialiseValue v