What type of enhancement is this?
Performance
What does the enhancement do?
Now the mito storage engine writes SST files to remote object stores directly.
|
pub struct ParquetWriter { |
|
/// SST output file path. |
|
file_path: String, |
|
/// Input data source. |
|
source: Source, |
|
/// Region metadata of the source and the target SST. |
|
metadata: RegionMetadataRef, |
|
object_store: ObjectStore, |
|
} |
We have to fetch the object from the object store again if we want to access the file. If we implement a write-through cache for parquet files, we don't need to download the object again.
Implementation challenges
This might increase the cost of uploading an object and the memory pressure of memtables.
- A better approach is to release the memtable once we flush the file to the write cache.
- We update the manifest after the object is fully uploaded to the remote object store
To implement async upload, we need to store other metadata such as flushed sequence and region id for files in the write cache. The region edit also requires memtable ids to remove flushed memtables. We should switch to using the minimum sequence of memtable as the memtable id is incremented globally.
For simplicity, we can implement the sync version first, which returns after files are uploaded.
Steps
Further discussions
- If the engine opens a region in a fresh env without the write cache, replaying the wal might cause oom
- We can trigger a flush during replay to avoid OOM and upload it once we have write permission
- Delays uploading level 0 files we might compact them later
Related Issues
It should be part of #2516
What type of enhancement is this?
Performance
What does the enhancement do?
Now the mito storage engine writes SST files to remote object stores directly.
greptimedb/src/mito2/src/sst/parquet/writer.rs
Lines 35 to 43 in 5f8c175
We have to fetch the object from the object store again if we want to access the file. If we implement a write-through cache for parquet files, we don't need to download the object again.
Implementation challenges
This might increase the cost of uploading an object and the memory pressure of memtables.
To implement async upload, we need to store other metadata such as flushed sequence and region id for files in the write cache. The region edit also requires memtable ids to remove flushed memtables. We should switch to using the minimum sequence of memtable as the memtable id is incremented globally.
For simplicity, we can implement the sync version first, which returns after files are uploaded.
Steps
WriteCacheskeleton feat(mito): Add WriteCache struct and write SSTs to write cache #2999FileHandlefor files in the write cache toVersionFurther discussions
Related Issues
It should be part of #2516