Skip to content

feat(parquet): introduce inverted index applier to reader#3130

Merged
waynexia merged 16 commits intoGreptimeTeam:mainfrom
zhongzc:zhongzc/inverted-index-sst-reader-intro
Jan 11, 2024
Merged

feat(parquet): introduce inverted index applier to reader#3130
waynexia merged 16 commits intoGreptimeTeam:mainfrom
zhongzc:zhongzc/inverted-index-sst-reader-intro

Conversation

@zhongzc
Copy link
Copy Markdown
Collaborator

@zhongzc zhongzc commented Jan 10, 2024

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Add the index applier in the Parquet reader to filter row groups:

  • Add the inverted_index_available property to SstInfo and FileMeta
  • Introduce the row_groups_to_read method for ParquetReaderBuilder, which returns row groups that still need to be read after being filtered through the inverted index and min-max index
  • Add metrics to observe the selectivity of the index

Moreover, once inverted_index_available becomes a property of FileMeta, it not only represents a single SST File but also includes the associated index files. Therefore, when handling deletions, they should be deleted together.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR does not require documentation updates.

Refer to a related PR or issue link (optional)

#2705

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@github-actions github-actions Bot added docs-not-required This change does not impact docs. Size: M labels Jan 10, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 10, 2024

Codecov Report

Attention: 24 lines in your changes are missing coverage. Please review.

Comparison is base (29a7f30) 85.48% compared to head (20f97c1) 85.04%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3130      +/-   ##
==========================================
- Coverage   85.48%   85.04%   -0.45%     
==========================================
  Files         822      822              
  Lines      134403   134560     +157     
==========================================
- Hits       114899   114431     -468     
- Misses      19504    20129     +625     

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Copy link
Copy Markdown
Member

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/mito2/src/access_layer.rs Outdated
Comment thread src/mito2/src/sst/parquet/reader.rs
Comment thread src/mito2/src/sst/parquet/reader.rs
zhongzc and others added 3 commits January 10, 2024 17:56
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc
Copy link
Copy Markdown
Collaborator Author

zhongzc commented Jan 10, 2024

@waynexia @evenyag PTAL

Copy link
Copy Markdown
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Comment thread src/mito2/src/sst/file.rs Outdated
Copy link
Copy Markdown
Contributor

@evenyag evenyag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc requested a review from waynexia January 11, 2024 05:21
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc added this pull request to the merge queue Jan 11, 2024
Comment thread src/mito2/src/sst/file.rs Outdated
@zhongzc zhongzc removed this pull request from the merge queue due to a manual request Jan 11, 2024
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc requested a review from evenyag January 11, 2024 07:19
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc
Copy link
Copy Markdown
Collaborator Author

zhongzc commented Jan 11, 2024

@evenyag @waynexia PTAL

@waynexia waynexia added this pull request to the merge queue Jan 11, 2024
Merged via the queue into GreptimeTeam:main with commit fd8fb64 Jan 11, 2024
@zhongzc zhongzc self-assigned this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants