Skip to content

feat(inverted_index.format): add writer#2900

Merged
zhongzc merged 3 commits intoGreptimeTeam:developfrom
zhongzc:zhongzc/inverted-index-format-writer
Dec 11, 2023
Merged

feat(inverted_index.format): add writer#2900
zhongzc merged 3 commits intoGreptimeTeam:developfrom
zhongzc:zhongzc/inverted-index-format-writer

Conversation

@zhongzc
Copy link
Copy Markdown
Collaborator

@zhongzc zhongzc commented Dec 11, 2023

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

Add format writer, accepts stream of values.

Checklist

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

#2705

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@zhongzc zhongzc force-pushed the zhongzc/inverted-index-format-writer branch from b7cdde4 to a6c792e Compare December 11, 2023 07:21
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 11, 2023

Codecov Report

Merging #2900 (c214012) into develop (bfb4794) will decrease coverage by 0.48%.
Report is 6 commits behind head on develop.
The diff coverage is 96.93%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2900      +/-   ##
===========================================
- Coverage    85.00%   84.53%   -0.48%     
===========================================
  Files          753      756       +3     
  Lines       118597   119511     +914     
===========================================
+ Hits        100818   101025     +207     
- Misses       17779    18486     +707     

Copy link
Copy Markdown
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM

Comment thread src/index/src/inverted_index/error.rs Outdated
Copy link
Copy Markdown
Member

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

//! An inverted index comprises a collection of bitmaps, a null bitmap, and a finite state transducer (FST) indicating tag values' positions:
//!
//! `bitmap₀ bitmap₁ bitmap₂ ... bitmapₙ null_bitmap fst`
//! `null_bitmap bitmap₀ bitmap₁ bitmap₂ ... bitmapₙ fst`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we move forward the null_bitmap before bitmaps, can you explain why?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For Read, bitmap reading relies on offset and size. Whether null_bitmap comes first or not does not affect the format and will not destroy the implementation.
  2. For Write, some implementations of index building may directly use BTreeMap<Option<Bytes>, BitVec>>. In this case, it is more natural to write null_bitmap first. (Of course, this is not the case in my implementation, but I'm happy to handle nulls first)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the info.

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
@zhongzc zhongzc enabled auto-merge December 11, 2023 09:52
@zhongzc zhongzc added this pull request to the merge queue Dec 11, 2023
Merged via the queue into GreptimeTeam:develop with commit 1e22f1c Dec 11, 2023
@zhongzc zhongzc deleted the zhongzc/inverted-index-format-writer branch December 11, 2023 10:06
@zhongzc zhongzc self-assigned this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants