Skip to content

doc: document table configuration #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,11 @@ constants:
- The variable *b* usually refers to the size of a batch of
inputs/outputs. Its precise meaning is explained for each occurrence.

- The constant *B* refers to the size of the write buffer, which is a
configuration parameter.
- The constant *B* refers to the size of the write buffer, which is
determined by the `TableConfig` parameter `confWriteBufferAlloc`.

- The constant *T* refers to the size ratio of the table, which is a
configuration parameter.
- The constant *T* refers to the size ratio of the table, which is
determined by the `TableConfig` parameter `confSizeRatio`.

- The constant *P* refers to the the average number of key–value pairs
that fit in a page of memory.
Expand All @@ -134,7 +134,9 @@ The following table summarises the cost of the operations on LSM-trees
measured in the number of disk I/O operations. If the cost depends on
the merge policy or merge schedule, then the table contains one entry
for each relevant combination. Otherwise, the merge policy and/or merge
schedule is listed as N/A.
schedule is listed as N/A. The merge policy and merge schedule are
determined by the `TableConfig` parameters `confMergePolicy` and
`confMergeSchedule`.

<table>
<thead>
Expand Down Expand Up @@ -291,9 +293,9 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*).
- The worst-case in-memory size of the write buffer is *O*(*B*).

The maximum size of the write buffer on the write buffer allocation
strategy, which is determined by the `confWriteBufferAlloc` field of
`TableConfig`. Regardless of write buffer allocation strategy, the
size of the write buffer may never exceed 4GiB.
strategy, which is determined by the `TableConfig` parameter
`confWriteBufferAlloc`. Regardless of write buffer allocation
strategy, the size of the write buffer may never exceed 4GiB.

`AllocNumEntries maxEntries`
The maximum size of the write buffer is the maximum number of entries
Expand All @@ -304,8 +306,8 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*).
The total in-memory size of all Bloom filters is the number of bits
per physical entry multiplied by the number of physical entries. The
required number of bits per physical entry is determined by the Bloom
filter allocation strategy, which is determined by the
`confBloomFilterAlloc` field of `TableConfig`.
filter allocation strategy, which is determined by the `TableConfig`
parameter `confBloomFilterAlloc`.

`AllocFixed bitsPerPhysicalEntry`
The number of bits per physical entry is specified as
Expand All @@ -329,9 +331,9 @@ The worst-case in-memory size of an LSM-tree is *O*(*n*).
- The worst-case in-memory size of the indexes is *O*(*n*).

The total in-memory size of all indexes depends on the index type,
which is determined by the `confFencePointerIndex` field of
`TableConfig`. The in-memory size of the various indexes is described
in reference to the size of the database in [*memory
which is determined by the `TableConfig` parameter
`confFencePointerIndex`. The in-memory size of the various indexes is
described in reference to the size of the database in [*memory
pages*](https://en.wikipedia.org/wiki/Page_%28computer_memory%29 "https://en.wikipedia.org/wiki/Page_%28computer_memory%29").

`OrdinaryIndex`
Expand Down
2 changes: 1 addition & 1 deletion bench/macro/lsm-tree-bench-wp8.hs
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ cmdP = O.subparser $ mconcat

setupOptsP :: O.Parser SetupOpts
setupOptsP = pure SetupOpts
<*> O.option O.auto (O.long "bloom-filter-alloc" <> O.value LSM.defaultBloomFilterAlloc <> O.showDefault <> O.help "Bloom filter allocation method [AllocFixed n | AllocRequestFPR d]")
<*> O.option O.auto (O.long "bloom-filter-alloc" <> O.value (LSM.confBloomFilterAlloc LSM.defaultTableConfig) <> O.showDefault <> O.help "Bloom filter allocation method [AllocFixed n | AllocRequestFPR d]")

runOptsP :: O.Parser RunOpts
runOptsP = pure RunOpts
Expand Down
16 changes: 11 additions & 5 deletions lsm-tree.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -71,15 +71,18 @@ description:
* The variable \(s\) refers to the number of snapshots in the session.
* The variable \(b\) usually refers to the size of a batch of inputs\/outputs.
Its precise meaning is explained for each occurrence.
* The constant \(B\) refers to the size of the write buffer, which is a configuration parameter.
* The constant \(T\) refers to the size ratio of the table, which is a configuration parameter.
* The constant \(B\) refers to the size of the write buffer,
which is determined by the @TableConfig@ parameter @confWriteBufferAlloc@.
* The constant \(T\) refers to the size ratio of the table,
which is determined by the @TableConfig@ parameter @confSizeRatio@.
* The constant \(P\) refers to the the average number of key–value pairs that fit in a page of memory.

=== Disk I\/O cost of operations #performance_time#

The following table summarises the cost of the operations on LSM-trees measured in the number of disk I\/O operations.
If the cost depends on the merge policy or merge schedule, then the table contains one entry for each relevant combination.
Otherwise, the merge policy and\/or merge schedule is listed as N\/A.
The merge policy and merge schedule are determined by the @TableConfig@ parameters @confMergePolicy@ and @confMergeSchedule@.

+----------+------------------------+-----------------+-----------------+------------------------------------------------+
| Resource | Operation | Merge policy | Merge schedule | Cost in disk I\/O operations |
Expand Down Expand Up @@ -132,7 +135,8 @@ description:

* The worst-case in-memory size of the write buffer is \(O(B)\).

The maximum size of the write buffer on the write buffer allocation strategy, which is determined by the @confWriteBufferAlloc@ field of @TableConfig@.
The maximum size of the write buffer on the write buffer allocation strategy,
which is determined by the @TableConfig@ parameter @confWriteBufferAlloc@.
Regardless of write buffer allocation strategy, the size of the write buffer may never exceed 4GiB.

[@AllocNumEntries maxEntries@]:
Expand All @@ -141,7 +145,8 @@ description:
* The worst-case in-memory size of the Bloom filters is \(O(n)\).

The total in-memory size of all Bloom filters is the number of bits per physical entry multiplied by the number of physical entries.
The required number of bits per physical entry is determined by the Bloom filter allocation strategy, which is determined by the @confBloomFilterAlloc@ field of @TableConfig@.
The required number of bits per physical entry is determined by the Bloom filter allocation strategy,
which is determined by the @TableConfig@ parameter @confBloomFilterAlloc@.

[@AllocFixed bitsPerPhysicalEntry@]:
The number of bits per physical entry is specified as @bitsPerPhysicalEntry@.
Expand All @@ -166,7 +171,8 @@ description:

* The worst-case in-memory size of the indexes is \(O(n)\).

The total in-memory size of all indexes depends on the index type, which is determined by the @confFencePointerIndex@ field of @TableConfig@.
The total in-memory size of all indexes depends on the index type,
which is determined by the @TableConfig@ parameter @confFencePointerIndex@.
The in-memory size of the various indexes is described in reference to the size of the database in [/memory pages/](https://en.wikipedia.org/wiki/Page_%28computer_memory%29).

[@OrdinaryIndex@]:
Expand Down
6 changes: 2 additions & 4 deletions src/Database/LSMTree.hs
Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,12 @@ module Database.LSMTree (
),
defaultTableConfig,
MergePolicy (LazyLevelling),
MergeSchedule (..),
SizeRatio (Four),
WriteBufferAlloc (AllocNumEntries),
BloomFilterAlloc (AllocFixed, AllocRequestFPR),
defaultBloomFilterAlloc,
FencePointerIndexType (OrdinaryIndex, CompactIndex),
DiskCachePolicy (..),
MergeSchedule (..),

-- ** Table Configuration Overrides #table_configuration_overrides#
OverrideDiskCachePolicy (..),
Expand Down Expand Up @@ -205,8 +204,7 @@ import Database.LSMTree.Internal.Config
DiskCachePolicy (..), FencePointerIndexType (..),
MergePolicy (..), MergeSchedule (..), SizeRatio (..),
TableConfig (..), WriteBufferAlloc (..),
defaultBloomFilterAlloc, defaultTableConfig,
serialiseKeyMinimalSize)
defaultTableConfig, serialiseKeyMinimalSize)
import Database.LSMTree.Internal.Config.Override
(OverrideDiskCachePolicy (..))
import qualified Database.LSMTree.Internal.Entry as Entry
Expand Down
77 changes: 47 additions & 30 deletions src/Database/LSMTree/Internal/Config.hs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ module Database.LSMTree.Internal.Config (
, WriteBufferAlloc (..)
-- * Bloom filter allocation
, BloomFilterAlloc (..)
, defaultBloomFilterAlloc
, bloomFilterAllocForLevel
-- * Fence pointer index
, FencePointerIndexType (..)
Expand All @@ -27,7 +26,6 @@ module Database.LSMTree.Internal.Config (
, diskCachePolicyForLevel
-- * Merge schedule
, MergeSchedule (..)
, defaultMergeSchedule
) where

import Control.DeepSeq (NFData (..))
Expand All @@ -49,26 +47,43 @@ newtype LevelNo = LevelNo Int
Table configuration
-------------------------------------------------------------------------------}

-- | Table configuration parameters, including LSM tree tuning parameters.
--
-- Some config options are fixed (for now):
--
-- * Merge policy: Tiering
--
-- * Size ratio: 4
{- |
A collection of configuration parameters for tables, which can be used to tune the performance of the table.
To construct a 'TableConfig', modify the 'defaultTableConfig', which defines reasonable defaults for all parameters.
For an overview of the performance implication of the table configuration, see the [Performance](../#performance) section in the package description.

Each configuration parameter is associated with its own type.
Detailed discussion of the use of each parameter can be found in the documentation for its associated type.

[@'confMergePolicy' :: 'MergePolicy'@]
The merge policy determines how the table manages its data,
which affects the disk I\/O cost of some operations.
[@'confMergeSchedule' :: 'MergeSchedule'@]
The merge schedule determines how the table manages its data,
which affects the disk I\/O cost of some operations.
[@'confSizeRatio' :: 'SizeRatio'@]
The size ratio determines how the table manages its data,
and is the parameter \(T\) in the disk I\/O cost of operations.
[@'confWriteBufferAlloc' :: 'WriteBufferAlloc'@]
The write buffer allocation strategy determines the maximum size of the in-memory write buffer,
and is the parameter \(B\) in the disk I\/O cost of operations.
Irrespective of this parameter, the write buffer size cannot exceed 4GiB.
[@'confBloomFilterAlloc' :: 'BloomFilterAlloc'@]
The Bloom filter allocation strategy determines the number of bits per physical entry allocated for the Bloom filters.
[@'confFencePointerIndex' :: 'FencePointerIndexType'@]
The fence pointer index type determines the type of indexes,
which affects the in-memory size of the table and may constrain the table keys.
[@'confDiskCachePolicy' :: 'DiskCachePolicy'@]
The disk cache policy determines the policy for caching data from disk in memory,
which may affect the performance of lookup operations.
-}
data TableConfig = TableConfig {
confMergePolicy :: !MergePolicy
, confMergeSchedule :: !MergeSchedule
-- Size ratio between the capacities of adjacent levels.
, confSizeRatio :: !SizeRatio
-- | Total number of bytes that the write buffer can use.
--
-- The maximum is 4GiB, which should be more than enough for realistic
-- applications.
, confWriteBufferAlloc :: !WriteBufferAlloc
, confBloomFilterAlloc :: !BloomFilterAlloc
, confFencePointerIndex :: !FencePointerIndexType
-- | The policy for caching key\/value data from disk in memory.
, confDiskCachePolicy :: !DiskCachePolicy
}
deriving stock (Show, Eq)
Expand All @@ -77,19 +92,31 @@ instance NFData TableConfig where
rnf (TableConfig a b c d e f g) =
rnf a `seq` rnf b `seq` rnf c `seq` rnf d `seq` rnf e `seq` rnf f `seq` rnf g

-- | A reasonable default 'TableConfig'.
-- | The 'defaultTableConfig' defines reasonable defaults for all 'TableConfig' parameters.
--
-- This uses a write buffer with up to 20,000 elements and a generous amount of
-- memory for Bloom filters (FPR of 1%).
-- >>> confMergePolicy defaultTableConfig
-- LazyLevelling
-- >>> confMergeSchedule defaultTableConfig
-- Incremental
-- >>> confSizeRatio defaultTableConfig
-- Four
-- >>> confWriteBufferAlloc defaultTableConfig
-- AllocNumEntries 20000
-- >>> confBloomFilterAlloc defaultTableConfig
-- AllocFixed 10
-- >>> confFencePointerIndex defaultTableConfig
-- OrdinaryIndex
-- >>> confDiskCachePolicy defaultTableConfig
-- DiskCacheAll
--
defaultTableConfig :: TableConfig
defaultTableConfig =
TableConfig
{ confMergePolicy = LazyLevelling
, confMergeSchedule = defaultMergeSchedule
, confMergeSchedule = Incremental
, confSizeRatio = Four
, confWriteBufferAlloc = AllocNumEntries 20_000
, confBloomFilterAlloc = defaultBloomFilterAlloc
, confBloomFilterAlloc = AllocFixed 10
, confFencePointerIndex = OrdinaryIndex
, confDiskCachePolicy = DiskCacheAll
}
Expand Down Expand Up @@ -173,9 +200,6 @@ instance NFData BloomFilterAlloc where
rnf (AllocFixed n) = rnf n
rnf (AllocRequestFPR fpr) = rnf fpr

defaultBloomFilterAlloc :: BloomFilterAlloc
defaultBloomFilterAlloc = AllocFixed 10

bloomFilterAllocForLevel :: TableConfig -> RunLevelNo -> RunBloomFilterAlloc
bloomFilterAllocForLevel conf _levelNo =
case confBloomFilterAlloc conf of
Expand Down Expand Up @@ -317,10 +341,3 @@ data MergeSchedule =
instance NFData MergeSchedule where
rnf OneShot = ()
rnf Incremental = ()

-- | The default 'MergeSchedule'.
--
-- >>> defaultMergeSchedule
-- Incremental
defaultMergeSchedule :: MergeSchedule
defaultMergeSchedule = Incremental