perf(remote-write): optimize decode prom #7761
Conversation
Add `decode_label_name` and `validate_label_name` to skip redundant UTF-8 validation for Prometheus label names, which are guaranteed ASCII (`[a-zA-Z_][a-zA-Z0-9_]*`). Rename `validate_bytes` to `validate_utf8` for clarity and add benchmarks for label name validation and UTF-8 validation (std vs simdutf8). Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the performance and memory efficiency of Prometheus Remote Write request processing. The core changes involve refining how Prometheus label names are validated and decoded, moving from a general UTF-8 check to a highly optimized byte-level validation. Additionally, memory usage is reduced by ensuring that decoded label names can borrow directly from the original request buffer, avoiding unnecessary data duplication. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes the decoding of Prometheus Remote Write requests in GreptimeDB by:
- Introducing a table-lookup-based label name validator (
validate_label_name) with loop unrolling instead of UTF-8 validation, conforming to the Prometheus label spec[a-zA-Z_][a-zA-Z0-9_]*. - Replacing
Bytesclone with&'static [u8](RawBytes) forcol_indexesinTableBuilderto avoid heap allocations, relying on a newraw_data: Bytesfield inPromWriteRequestto keep the buffer alive. - Renaming
PromWriteRequest::mergetodecodeand adding a newdecode_label_namemethod that always validates label names regardless ofPromValidationMode.
Changes:
- Added
validate_label_namewith a 256-entry lookup table and 8-byte unrolled loop, plusdecode_label_namethat returns&strwithout allocation - Changed
TableBuilder::col_indexesfromHashMap<Vec<u8>, usize>toHashMap<RawBytes, usize>and addedraw_data: BytestoPromWriteRequestto hold the buffer - Added benchmarks for label name validation and UTF-8 validation comparisons
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/servers/src/http.rs |
Added validate_label_name function, IS_VALID_LABEL_REST lookup table, decode_label_name method, removed validate_bytes, added tests |
src/servers/src/proto.rs |
Added raw_data: Bytes field to PromWriteRequest, renamed merge to decode, updated Clear impl |
src/servers/src/prom_row_builder.rs |
Changed col_indexes to use RawBytes, updated validation to use validate_label_name and decode_label_name |
src/servers/src/http/prom_store.rs |
Updated merge call to decode |
src/servers/benches/prom_decode.rs |
Added benchmarks for label name and UTF-8 validation, updated merge to decode |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…p unrolling Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
- **Refactor UTF-8 Validation and Label Decoding**: - Removed `validate_utf8` method and integrated label name validation directly in `decode_label_name` in `http.rs`. - Updated `decode_label_name` to always enforce Prometheus label name validation across all modes. - Adjusted test cases in `http.rs` to reflect the new validation logic. - **Enhance Label Validation in `prom_row_builder.rs`**: - Replaced UTF-8 validation with direct label name validation using `validate_label_name`. - Updated `decode_label_name` usage to return `&str` and adjusted related logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
**Refactor `TableBuilder` to Use `RawBytes` for Column Indexes** - Updated `TableBuilder` in `prom_row_builder.rs` to use `RawBytes` instead of `Vec<u8>` for `col_indexes`. - Modified `with_capacity` method to directly insert `RawBytes` for timestamp and value columns. - Adjusted schema handling to use `to_owned` for `tag_name` and directly insert `raw_tag_name` into `col_indexes`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
### Commit Message Refactor `PromWriteRequest` Method and Enhance Data Handling - **Refactor Method**: Renamed the `merge` method to `decode` in `PromWriteRequest` to better reflect its functionality. Updated references in `prom_decode.rs`, `prom_store.rs`, and `prom_row_builder.rs`. - **Enhance Data Handling**: Introduced `raw_data` field in `PromWriteRequest` to store a clone of the buffer for potential future use. Updated the `clear` method to reset `raw_data`. Files affected: `prom_decode.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
**Commit Summary:** - **Enhancement in `prom_row_builder.rs`:** - Added a new field `raw_data` of type `Bytes` to `TablesBuilder`. - Implemented `set_raw_data` method to update `raw_data`. - Modified `clear` method to reset `raw_data`. - **Refactor in `proto.rs`:** - Removed `raw_data` field from `PromWriteRequest`. - Updated `decode_and_process` method to use `set_raw_data` from `TablesBuilder` for handling raw data. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2beb1ff to
d1d5110
Compare
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance optimizations for decoding Prometheus remote write requests, primarily through more efficient label name validation and by avoiding memory allocations using &'static [u8] slices. However, a critical security concern has been identified regarding the use of a RawBytes type aliased to &'static [u8]. This approach deliberately bypasses Rust's borrow checker and memory safety guarantees, and using 'static for non-static data is highly dangerous, potentially leading to use-after-free vulnerabilities. It is strongly recommended to use safe alternatives such as Bytes or proper lifetime parameters to tie the decoded data to the source buffer's lifetime. Additionally, there is one suggestion to improve code consistency.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
### Commit Message Refactor `TablesBuilder` and `TableBuilder` to Use Lifetime Annotations - Updated `prom_store.rs`: - Modified `PROM_WRITE_REQUEST_POOL` and `decode_remote_write_request` to use lifetime annotations for `PromWriteRequest` and `TablesBuilder`. - Updated `prom_row_builder.rs`: - Refactored `TablesBuilder` and `TableBuilder` structs to include lifetime annotations. - Adjusted methods in `TablesBuilder` and `TableBuilder` to accommodate lifetime changes. - Updated `proto.rs`: - Added lifetime annotations to `PromWriteRequest` and its methods. - Modified `add_to_table_data` to use lifetime annotations for `TablesBuilder`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request aims to significantly optimize Prometheus remote write request decoding by improving label name validation and reducing memory allocations. However, a critical security vulnerability has been identified: the unsafe casting of references to 'static is unsound and poses a risk of Use-After-Free. Furthermore, the reuse of PromWriteRequest objects from a pool without proper clearing could lead to data leakage. These memory safety and data leakage issues must be addressed, and a suggestion has been provided to improve the readability of the new validation function.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
Optimize decoding of Prometheus Remote Write requests:
&'static [u8]since we can ensure the original buffer outlives the decoded dataImprove strict validation by ~10%
PR Checklist
Please convert it to a draft if some of the following conditions are not met.