-
Notifications
You must be signed in to change notification settings - Fork 478
docs: rfc for table compaction #939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| --- | ||
| Feature Name: "table-compaction" | ||
| Tracking Issue: https://github.com/GreptimeTeam/greptimedb/issues/930 | ||
| Date: 2023-02-01 | ||
| Author: "Lei, HUANG <mrsatangel@gmail.com>" | ||
| --- | ||
|
|
||
| # Table Compaction | ||
|
|
||
| --- | ||
|
|
||
| ## Background | ||
|
|
||
| GreptimeDB uses an LSM-tree based storage engine that flushes memtables to SSTs for persistence. | ||
| But currently it only supports level 0. SST files in level 0 does not guarantee to contain only rows with disjoint time ranges. | ||
| That is to say, different SST files in level 0 may contain overlapped timestamps. | ||
| The consequence is, in order to retrieve rows in some time range, all files need to be scanned, which brings a lot of IO overhead. | ||
|
|
||
| Also, just like other LSMT engines, delete/update to existing primary keys are converted to new rows with delete/update mark and appended to SSTs on flushing. | ||
| We need to merge the operations to same primary keys so that we don't have to go through all SST files to find the final state of these primary keys. | ||
|
|
||
| ## Goal | ||
|
|
||
| Implement a compaction framework to: | ||
| - maintain SSTs in timestamp order to accelerate queries with timestamp condition; | ||
| - merge rows with same primary key; | ||
| - purge expired SSTs; | ||
| - accommodate other tasks like data rollup/indexing. | ||
|
|
||
|
|
||
| ## Overview | ||
|
|
||
| Table compaction involves following components: | ||
| - Compaction scheduler: run compaction tasks, limit the consumed resources; | ||
| - Compaction strategy: find the SSTs to compact and determine the output files of compaction. | ||
| - Compaction task: read the rows from input SSTs and write to the output files. | ||
|
|
||
| ## Implementation | ||
|
|
||
| ### Compaction scheduler | ||
|
|
||
| `CompactionScheduler` is an executor that continuously polls and executes compaction request from a task queue. | ||
|
|
||
| ```rust | ||
| #[async_trait] | ||
| pub trait CompactionScheduler { | ||
| /// Schedules a compaction task. | ||
| async fn schedule(&self, task: CompactionRequest) -> Result<()>; | ||
|
|
||
| /// Stops compaction scheduler. | ||
| async fn stop(&self) -> Result<()>; | ||
| } | ||
| ``` | ||
|
|
||
|
|
||
|
|
||
| ### Compaction triggering | ||
|
|
||
| Currently, we can check whether to compact tables when memtable is flushed to SST. | ||
|
|
||
| https://github.com/GreptimeTeam/greptimedb/blob/4015dd80752e1e6aaa3d7cacc3203cb67ed9be6d/src/storage/src/flush.rs#L245 | ||
|
|
||
|
|
||
| ### Compaction strategy | ||
|
|
||
| `CompactionStrategy` defines how to pick SSTs in all levels for compaction. | ||
|
|
||
| ```rust | ||
| pub trait CompactionStrategy { | ||
| fn pick( | ||
| &self, | ||
| ctx: CompactionContext, | ||
| levels: &LevelMetas, | ||
| ) -> Result<CompactionTask>; | ||
| } | ||
| ``` | ||
|
|
||
| The most suitable compaction strategy for time-series scenario would be | ||
| a hybrid strategy that combines time window compaction with size-tired compaction, just like [Cassandra](https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html) and [ScyllaDB](https://docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#time-window-compaction-strategy-twcs) does. | ||
|
|
||
| We can first group SSTs in level n into buckets according to some predefined time window. Within that window, | ||
| SSTs are compacted in a size-tired manner (find SSTs with similar size and compact them to level n+1). | ||
| SSTs from different time windows are neven compacted together. | ||
| That strategy guarantees SSTs in each level are mainly sorted in timestamp order which boosts queries with | ||
| explict timestamp condition, while size-tired compaction minimizes the impact to foreground writes. | ||
|
|
||
| ### Alternatives | ||
|
|
||
| Currently, GreptimeDB's storage engine [only support two levels](https://github.com/GreptimeTeam/greptimedb/blob/43aefc5d74dfa73b7819cae77b7eb546d8534a41/src/storage/src/sst.rs#L32). | ||
| For level 0, we can start with a simple time-window based leveled compaction, which reads from all SSTs in level 0, | ||
| align them to time windows with a fixed duration, merge them with SSTs in level 1 within the same time window | ||
| to ensure there is only one sorted run in level 1. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.