Skip to content

perf(remote-write): optimize decode prom #7761

Merged
v0y4g3r merged 9 commits intoGreptimeTeam:mainfrom
v0y4g3r:perf/decode-prom-2
Mar 6, 2026
Merged

perf(remote-write): optimize decode prom #7761
v0y4g3r merged 9 commits intoGreptimeTeam:mainfrom
v0y4g3r:perf/decode-prom-2

Conversation

@v0y4g3r
Copy link
Copy Markdown
Contributor

@v0y4g3r v0y4g3r commented Mar 5, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Optimize decoding of Prometheus Remote Write requests:

  • Label name validation: replace UTF-8 validation by table look up and loop unrolling to conform to the spec
  • Replace Bytes clone with &'static [u8] since we can ensure the original buffer outlives the decoded data

Improve strict validation by ~10%

decode_prom_request/validation_mode/Strict
                        time:   [604.45 µs 607.40 µs 611.08 µs]
                        change: [-10.818% -10.502% -10.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  11 (11.00%) high severe
Benchmarking decode_prom_request/validation_mode/Lossy: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 3.0s. You may wish to increase target time to 3.2s, enable flat sampling, or reduce sample count to 60.
decode_prom_request/validation_mode/Lossy
                        time:   [621.12 µs 621.57 µs 622.05 µs]
                        change: [-5.5518% -5.4335% -5.3087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
decode_prom_request/validation_mode/Unchecked
                        time:   [575.40 µs 576.41 µs 577.60 µs]
                        change: [-5.2507% -5.0604% -4.8747%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

Add `decode_label_name` and `validate_label_name` to skip redundant
UTF-8 validation for Prometheus label names, which are guaranteed ASCII
(`[a-zA-Z_][a-zA-Z0-9_]*`). Rename `validate_bytes` to `validate_utf8`
for clarity and add benchmarks for label name validation and UTF-8
validation (std vs simdutf8).

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
@v0y4g3r v0y4g3r requested a review from a team as a code owner March 5, 2026 12:26
Copilot AI review requested due to automatic review settings March 5, 2026 12:26
@github-actions github-actions bot added size/S docs-not-required This change does not impact docs. labels Mar 5, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and memory efficiency of Prometheus Remote Write request processing. The core changes involve refining how Prometheus label names are validated and decoded, moving from a general UTF-8 check to a highly optimized byte-level validation. Additionally, memory usage is reduced by ensuring that decoded label names can borrow directly from the original request buffer, avoiding unnecessary data duplication.

Highlights

  • Performance Optimization: Optimized Prometheus Remote Write request decoding, resulting in approximately a 10% performance improvement for strict validation mode.
  • Label Name Validation: Replaced generic UTF-8 validation for Prometheus label names with a specialized byte-level lookup table and loop unrolling, adhering strictly to the Prometheus specification.
  • Memory Efficiency: Reduced memory allocations by replacing Bytes cloning with &'static [u8] for label names, leveraging the fact that the original buffer outlives the decoded data.
Changelog
  • src/servers/benches/prom_decode.rs
    • Imported black_box and validate_label_name for new benchmarks
    • Updated prom_request.merge to prom_request.decode in existing benchmark
    • Added bench_label_name_validation to compare different label name validation strategies
    • Added bench_utf8_validation to compare std::str::from_utf8 and simdutf8::basic::from_utf8
    • Included new benchmarks in the criterion_group! macro
  • src/servers/src/http.rs
    • Defined IS_VALID_LABEL_REST constant array for efficient character lookup
    • Implemented validate_label_name function for byte-level Prometheus label name validation with loop unrolling
    • Added decode_label_name method to PromValidationMode for specialized label name decoding
    • Removed the validate_bytes method from PromValidationMode
    • Added comprehensive unit tests for decode_label_name and validate_label_name
  • src/servers/src/http/prom_store.rs
    • Updated the call from request.merge to request.decode for PromWriteRequest
  • src/servers/src/prom_row_builder.rs
    • Imported validate_label_name and RawBytes
    • Changed col_indexes type from HashMap<Vec<u8>, usize> to HashMap<RawBytes, usize>
    • Updated col_indexes.insert calls to use as_bytes() instead of to_owned()
    • Replaced prom_validation_mode.validate_bytes(name) with a direct call to validate_label_name(name)
    • Changed prom_validation_mode.decode_string(raw_tag_name) to prom_validation_mode.decode_label_name(raw_tag_name)
    • Adjusted column_name assignment and col_indexes.insert to work with RawBytes
  • src/servers/src/proto.rs
    • Added raw_data: Bytes field to PromWriteRequest to store the original buffer
    • Modified clear method to also clear the raw_data field
    • Renamed the merge method to decode in PromWriteRequest
    • Updated the decode method to store a clone of the input buffer in raw_data
    • Updated PromSeriesProcessor to use prom_validation_mode.decode_label_name for label name processing
    • Updated benchmark call from merge to decode
Activity
  • The pull request was created by v0y4g3r with an initial set of changes and a detailed description including performance benchmarks.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the decoding of Prometheus Remote Write requests in GreptimeDB by:

  1. Introducing a table-lookup-based label name validator (validate_label_name) with loop unrolling instead of UTF-8 validation, conforming to the Prometheus label spec [a-zA-Z_][a-zA-Z0-9_]*.
  2. Replacing Bytes clone with &'static [u8] (RawBytes) for col_indexes in TableBuilder to avoid heap allocations, relying on a new raw_data: Bytes field in PromWriteRequest to keep the buffer alive.
  3. Renaming PromWriteRequest::merge to decode and adding a new decode_label_name method that always validates label names regardless of PromValidationMode.

Changes:

  • Added validate_label_name with a 256-entry lookup table and 8-byte unrolled loop, plus decode_label_name that returns &str without allocation
  • Changed TableBuilder::col_indexes from HashMap<Vec<u8>, usize> to HashMap<RawBytes, usize> and added raw_data: Bytes to PromWriteRequest to hold the buffer
  • Added benchmarks for label name validation and UTF-8 validation comparisons

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/servers/src/http.rs Added validate_label_name function, IS_VALID_LABEL_REST lookup table, decode_label_name method, removed validate_bytes, added tests
src/servers/src/proto.rs Added raw_data: Bytes field to PromWriteRequest, renamed merge to decode, updated Clear impl
src/servers/src/prom_row_builder.rs Changed col_indexes to use RawBytes, updated validation to use validate_label_name and decode_label_name
src/servers/src/http/prom_store.rs Updated merge call to decode
src/servers/benches/prom_decode.rs Added benchmarks for label name and UTF-8 validation, updated merge to decode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added size/M and removed size/S labels Mar 6, 2026
v0y4g3r added 6 commits March 6, 2026 11:44
…p unrolling

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
 - **Refactor UTF-8 Validation and Label Decoding**:
   - Removed `validate_utf8` method and integrated label name validation directly in `decode_label_name` in `http.rs`.
   - Updated `decode_label_name` to always enforce Prometheus label name validation across all modes.
   - Adjusted test cases in `http.rs` to reflect the new validation logic.

 - **Enhance Label Validation in `prom_row_builder.rs`**:
   - Replaced UTF-8 validation with direct label name validation using `validate_label_name`.
   - Updated `decode_label_name` usage to return `&str` and adjusted related logic.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
 **Refactor `TableBuilder` to Use `RawBytes` for Column Indexes**

 - Updated `TableBuilder` in `prom_row_builder.rs` to use `RawBytes` instead of `Vec<u8>` for `col_indexes`.
 - Modified `with_capacity` method to directly insert `RawBytes` for timestamp and value columns.
 - Adjusted schema handling to use `to_owned` for `tag_name` and directly insert `raw_tag_name` into `col_indexes`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
 ### Commit Message

 Refactor `PromWriteRequest` Method and Enhance Data Handling

 - **Refactor Method**: Renamed the `merge` method to `decode` in `PromWriteRequest` to better reflect its functionality. Updated references in `prom_decode.rs`, `prom_store.rs`, and `prom_row_builder.rs`.
 - **Enhance Data Handling**: Introduced `raw_data` field in `PromWriteRequest` to store a clone of the buffer for potential future use. Updated the `clear` method to reset `raw_data`.

 Files affected: `prom_decode.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
 **Commit Summary:**

 - **Enhancement in `prom_row_builder.rs`:**
   - Added a new field `raw_data` of type `Bytes` to `TablesBuilder`.
   - Implemented `set_raw_data` method to update `raw_data`.
   - Modified `clear` method to reset `raw_data`.

 - **Refactor in `proto.rs`:**
   - Removed `raw_data` field from `PromWriteRequest`.
   - Updated `decode_and_process` method to use `set_raw_data` from `TablesBuilder` for handling raw data.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
@v0y4g3r v0y4g3r force-pushed the perf/decode-prom-2 branch from 2beb1ff to d1d5110 Compare March 6, 2026 03:44
@v0y4g3r v0y4g3r requested a review from Copilot March 6, 2026 08:16
@v0y4g3r
Copy link
Copy Markdown
Contributor Author

v0y4g3r commented Mar 6, 2026

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance optimizations for decoding Prometheus remote write requests, primarily through more efficient label name validation and by avoiding memory allocations using &'static [u8] slices. However, a critical security concern has been identified regarding the use of a RawBytes type aliased to &'static [u8]. This approach deliberately bypasses Rust's borrow checker and memory safety guarantees, and using 'static for non-static data is highly dangerous, potentially leading to use-after-free vulnerabilities. It is strongly recommended to use safe alternatives such as Bytes or proper lifetime parameters to tie the decoded data to the source buffer's lifetime. Additionally, there is one suggestion to improve code consistency.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 ### Commit Message

 Refactor `TablesBuilder` and `TableBuilder` to Use Lifetime Annotations

 - Updated `prom_store.rs`:
   - Modified `PROM_WRITE_REQUEST_POOL` and `decode_remote_write_request` to use lifetime annotations for `PromWriteRequest` and `TablesBuilder`.

 - Updated `prom_row_builder.rs`:
   - Refactored `TablesBuilder` and `TableBuilder` structs to include lifetime annotations.
   - Adjusted methods in `TablesBuilder` and `TableBuilder` to accommodate lifetime changes.

 - Updated `proto.rs`:
   - Added lifetime annotations to `PromWriteRequest` and its methods.
   - Modified `add_to_table_data` to use lifetime annotations for `TablesBuilder`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
@v0y4g3r
Copy link
Copy Markdown
Contributor Author

v0y4g3r commented Mar 6, 2026

@gemini-code-assist review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to significantly optimize Prometheus remote write request decoding by improving label name validation and reducing memory allocations. However, a critical security vulnerability has been identified: the unsafe casting of references to 'static is unsound and poses a risk of Use-After-Free. Furthermore, the reuse of PromWriteRequest objects from a pool without proper clearing could lead to data leakage. These memory safety and data leakage issues must be addressed, and a suggestion has been provided to improve the readability of the new validation function.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
@v0y4g3r v0y4g3r requested a review from MichaelScofield March 6, 2026 09:35
@v0y4g3r v0y4g3r enabled auto-merge March 6, 2026 09:45
@v0y4g3r v0y4g3r added this pull request to the merge queue Mar 6, 2026
Merged via the queue into GreptimeTeam:main with commit 2000611 Mar 6, 2026
45 checks passed
@v0y4g3r v0y4g3r deleted the perf/decode-prom-2 branch March 6, 2026 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants