Skip to content

feat(mito): enable inverted index#3158

Merged
zhongzc merged 11 commits intoGreptimeTeam:mainfrom
zhongzc:zhongzc/inverted-index-enable
Jan 15, 2024
Merged

feat(mito): enable inverted index#3158
zhongzc merged 11 commits intoGreptimeTeam:mainfrom
zhongzc:zhongzc/inverted-index-enable

Conversation

@zhongzc
Copy link
Copy Markdown
Collaborator

@zhongzc zhongzc commented Jan 12, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

This PR has enabled the inverted index for the mito engine.

Main changes:

  • Introduced IntermediateManager with the objectives:
    1. To ensure intermediate files are read and written only on the local file system, avoiding access to object storage services during index creation.
    2. To clean up any residual intermediate files left by abnormal exits of the greptimedb service. This also requires that the IntermediateManager be a singleton to prevent repetitive deletions that could cause errors.
  • Modified the path of IntermediateLocation. Previously, the path was placed near the data files, but with the introduction of the IntermediateManager, data files and intermediate files were completely isolated, hence a more customized path was adopted.
  • Introduced Indexer, embedded within ParquetWriter. Indexer is used to create the index and hides error handling internally, exposing three methods to ParquetWriter that do not return errors: update, finish, and abort.
  • Added InvertedIndexConfig to MitoConfig, which includes the following parameters:
    • Toggle type: create_on_flush, create_on_compaction, apply_on_query
    • intermediate_path: The file system path for intermediates
    • mem_threshold_on_create: Memory control when creating the index
  • Modified MitoConfig::sanitize, taking data_home as an input because both intermediate_path and experimental_write_cache_path depend on data_home for setting default paths.
  • ScanRegion disables index apply during queries based on the apply_on_query parameter.
  • SstWriteRequest introduces create_inverted_index and mem_threshold_index_create. The decision to create an index during flush and compaction is controlled by passing the configs from MitoConfig to these two parameters.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

#2705

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc self-assigned this Jan 12, 2024
… Engine

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 12, 2024

Codecov Report

Attention: 47 lines in your changes are missing coverage. Please review.

Comparison is base (bf88b3b) 85.43% compared to head (448150d) 85.09%.
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3158      +/-   ##
==========================================
- Coverage   85.43%   85.09%   -0.34%     
==========================================
  Files         823      829       +6     
  Lines      134922   135714     +792     
==========================================
+ Hits       115268   115492     +224     
- Misses      19654    20222     +568     

Comment thread src/mito2/src/access_layer.rs Outdated
Comment thread src/mito2/src/test_util/scheduler_util.rs Outdated
Comment thread src/mito2/src/cache/write_cache.rs Outdated
Comment thread src/mito2/src/sst/index.rs Outdated
Comment thread src/mito2/src/sst/index/creator.rs Outdated
Comment thread src/mito2/src/sst/file_purger.rs Outdated
Comment thread src/mito2/src/read/scan_region.rs
Comment thread src/mito2/src/config.rs
zhongzc and others added 2 commits January 15, 2024 14:00
Co-authored-by: Yingwen <realevenyag@gmail.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc force-pushed the zhongzc/inverted-index-enable branch from 82322c2 to 6403d01 Compare January 15, 2024 06:18
…to field of WriteCache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@github-actions github-actions Bot added Size: XL and removed Size: L labels Jan 15, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc requested a review from evenyag January 15, 2024 08:08
Comment thread config/datanode.example.toml
Copy link
Copy Markdown
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhongzc zhongzc added this pull request to the merge queue Jan 15, 2024
Merged via the queue into GreptimeTeam:main with commit 6f07d69 Jan 15, 2024
@zhongzc zhongzc deleted the zhongzc/inverted-index-enable branch January 15, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required This change requires docs update.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants