Skip to content

doc: document table configuration #701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 26, 2025
Merged

Conversation

wenkokke
Copy link
Collaborator

@wenkokke wenkokke commented May 1, 2025

No description provided.

@wenkokke wenkokke marked this pull request as draft May 1, 2025 16:52
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch from d7cceeb to 0e91e43 Compare May 1, 2025 16:53
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch 17 times, most recently from f05d7cb to fa504ed Compare May 13, 2025 17:36
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch 5 times, most recently from 54eda63 to dba3c07 Compare May 16, 2025 19:26
@wenkokke wenkokke changed the title doc: document table configuration doc: finalise documentation May 16, 2025
@wenkokke wenkokke marked this pull request as ready for review May 20, 2025 16:11
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch from 737998c to bef1638 Compare May 20, 2025 16:12
@dcoutts
Copy link
Collaborator

dcoutts commented May 21, 2025

Ahh, the docspec error is because I changed the AllocFixed from Int to Double. So that needs updating.

@dcoutts
Copy link
Collaborator

dcoutts commented May 21, 2025

Ah, and you did that just as I was writing 😄

Copy link
Collaborator

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got part way through this.

@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch 2 times, most recently from 5812cc9 to 92c3a9c Compare May 22, 2025 12:05
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch from e06e26c to 76cfba2 Compare May 23, 2025 10:04
@wenkokke wenkokke force-pushed the wenkokke/doc-table-config branch from 76cfba2 to 4f9132b Compare May 26, 2025 13:12
Copy link
Collaborator

@recursion-ninja recursion-ninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what extensive changes! I really like the type-class laws and complexity annotations that were added. This is appears ready to merge to me, and need to delay and risk bit-rot.

@recursion-ninja recursion-ninja enabled auto-merge May 26, 2025 13:31
@recursion-ninja recursion-ninja added this pull request to the merge queue May 26, 2025
Merged via the queue into main with commit 4711ca0 May 26, 2025
28 checks passed
@recursion-ninja recursion-ninja deleted the wenkokke/doc-table-config branch May 26, 2025 14:30
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming together nicely 💃

What's your take on having duplicated text? See my comment on the haddocks for config types

Comment on lines +49 to +92
{- |
A collection of configuration parameters for tables, which can be used to tune the performance of the table.
To construct a 'TableConfig', modify the 'defaultTableConfig', which defines reasonable defaults for all parameters.

For a detailed discussion of fine-tuning the table configuration, see [Fine-tuning Table Configuration](../#fine_tuning).

[@confMergePolicy :: t'MergePolicy'@]
The /merge policy/ balances the performance of lookups against the performance of updates.
Levelling favours lookups.
Tiering favours updates.
Lazy levelling strikes a middle ground between levelling and tiering, and moderately favours updates.
This parameter is explicitly referenced in the documentation of those operations it affects.

[@confSizeRatio :: t'SizeRatio'@]
The /size ratio/ pushes the effects of the merge policy to the extreme.
If the size ratio is higher, levelling favours lookups more, and tiering and lazy levelling favour updates more.
This parameter is referred to as \(T\) in the disk I\/O cost of operations.

[@confWriteBufferAlloc :: t'WriteBufferAlloc'@]
The /write buffer capacity/ balances the performance of lookups and updates against the in-memory size of the database.
If the write buffer is larger, it takes up more memory, but lookups and updates are more efficient.
This parameter is referred to as \(B\) in the disk I\/O cost of operations.
Irrespective of this parameter, the write buffer size cannot exceed 4GiB.

[@confMergeSchedule :: t'MergeSchedule'@]
The /merge schedule/ balances the performance of lookups and updates against the consistency of updates.
The merge schedule does not affect the performance of table unions.
With the one-shot merge schedule, lookups and updates are more efficient overall, but some updates may take much longer than others.
With the incremental merge schedule, lookups and updates are less efficient overall, but each update does a similar amount of work.
This parameter is explicitly referenced in the documentation of those operations it affects.

[@confBloomFilterAlloc :: t'BloomFilterAlloc'@]
The Bloom filter size balances the performance of lookups against the in-memory size of the database.
If the Bloom filters are larger, they take up more memory, but lookup operations are more efficient.

[@confFencePointerIndex :: t'FencePointerIndexType'@]
The /fence-pointer index type/ supports two types of indexes.
The /ordinary/ indexes are designed to work with any key.
The /compact/ indexes are optimised for the case where the keys in the database are uniformly distributed, e.g., when the keys are hashes.

[@confDiskCachePolicy :: t'DiskCachePolicy'@]
The /disk cache policy/ supports caching lookup operations using the OS page cache.
Caching may improve the performance of lookups if database access follows certain patterns.
-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is almost an identical copy of the #fine-tuning section of the package description. This runs the risk of having two out-of-sync texts, and in coincidentally it is already out of sync in this PR (#701), presumably after resolving some review comments from @dcoutts

Without some mechanised method of keeping the texts in sync, I think it's better to have a single of source of truth, and link to that text from other places rather than copying the text. For example, since we're linking to #fine_tuning in the text here anyway, we could remove the copied parts from the haddock entirely. Or conversely, we could link to the TableConfig from the package description

Comment on lines +151 to 164
{- |
The /merge policy/ balances the performance of lookups against the performance of updates.
Levelling favours lookups.
Tiering favours updates.
Lazy levelling strikes a middle ground between levelling and tiering, and moderately favours updates.
This parameter is explicitly referenced in the documentation of those operations it affects.

__NOTE:__ This package only supports lazy levelling.

For a detailed discussion of the merge policy, see [Fine-tuning: Merge Policy, Size Ratio, and Write Buffer Size](../#fine_tuning_data_layout).
-}
data MergePolicy =
-- | Use tiering on intermediate levels, and levelling on the last level.
-- This makes it easier for delete operations to disappear on the last
-- level.
LazyLevelling
-- TODO: add other merge policies, like tiering and levelling.
deriving stock (Eq, Show)
Copy link
Collaborator

@jorisdral jorisdral Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similary, the haddock comment here is also a copy of the text in the package description and the haddock comment on the TableConfig type. That means more risk of out-of-sync documentation.

Maybe my final suggestion would be: just keep the haddock on the individual config types like MergePolicy, SizeRatio, etc., and remove the copied parts in the overviews on the TableConfig type and the package description


[@confDiskCachePolicy@]
The /disk cache policy/ determines if lookup operations use the OS page cache.
Caching may improve the performance of lookups if database access follows certain patterns.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching may also improve the performance of updates, because merges perform reads, and if those disk pages are already memory then the update is cheaper

Copy link
Collaborator Author

@wenkokke wenkokke Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want this discussed with any nuance, or would the following change suffice?

- Caching may improve the performance of lookups if database access follows certain patterns.
+ Caching may improve the performance of lookups and updates if database access follows certain patterns.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just mentioning it "tussen neus en lippen door" sounds good to me

@wenkokke
Copy link
Collaborator Author

wenkokke commented Jun 2, 2025

This is coming together nicely 💃

What's your take on having duplicated text? See my comment on the haddocks for config types

My high level take is that people access docs non-linearly and therefore text duplication is necessary. It isn't easily doable to include these bits of text from a single source, so the best we can do is test consistency.

@jorisdral
Copy link
Collaborator

This is coming together nicely 💃
What's your take on having duplicated text? See my comment on the haddocks for config types

My high level take is that people access docs non-linearly and therefore text duplication is necessary. It isn't easily doable to include these bits of text from a single source, so the best we can do is test consistency.

I'm not sure I follow why duplication would be necessary. Would hyperlinking to a single source be a bad thing?

@wenkokke
Copy link
Collaborator Author

wenkokke commented Jun 2, 2025

This is coming together nicely 💃

What's your take on having duplicated text? See my comment on the haddocks for config types

My high level take is that people access docs non-linearly and therefore text duplication is necessary. It isn't easily doable to include these bits of text from a single source, so the best we can do is test consistency.

I'm not sure I follow why duplication would be necessary. Would hyperlinking to a single source be a bad thing?

Especially for short snippets, following hyperlinks disrupts the flow of reading. It also requires you have a link anchor, which I believe in Haddock means having a section header.

@jorisdral
Copy link
Collaborator

jorisdral commented Jun 3, 2025

This is coming together nicely 💃

What's your take on having duplicated text? See my comment on the haddocks for config types

My high level take is that people access docs non-linearly and therefore text duplication is necessary. It isn't easily doable to include these bits of text from a single source, so the best we can do is test consistency.

I'm not sure I follow why duplication would be necessary. Would hyperlinking to a single source be a bad thing?

Especially for short snippets, following hyperlinks disrupts the flow of reading. It also requires you have a link anchor, which I believe in Haddock means having a section header.

Hmm okay, but would it still make sense to remove the overview from TableConfig? If you follow the reading order of the exports in the public API, the TableConfig type is followed by a list individual config option types anyway. So it seems less useful there to duplicate information

@wenkokke
Copy link
Collaborator Author

wenkokke commented Jun 3, 2025

They're such short snippets that are unlikely to be invalidated in the future, given that they only really give a high level intuition. As such, I think linking would be detrimental to the reading experience, which is why they're duplicated.

wenkokke added a commit that referenced this pull request Jun 3, 2025
wenkokke added a commit that referenced this pull request Jun 3, 2025
wenkokke added a commit that referenced this pull request Jun 4, 2025
wenkokke added a commit that referenced this pull request Jun 5, 2025
github-merge-queue bot pushed a commit that referenced this pull request Jun 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants