POC: Parquet predicate results cache #7760

alamb · 2025-06-24T11:56:26Z

Which issue does this PR close?

Next iteration of POC: Sketch out parquet cached filter result API #7513
Part of [EPIC] Faster performance for parquet predicate evaluation for non selective filters #7456

TODO:

Avoid double buffering of intermediate results (via coalesce)
File ticket to optimize coalesce kernel more: [EPIC] Optimize the coalesece kernel (BatchCoalescer) #7761
Add memory limit for results cache
Review actual predicates that are pushed down in DataFusion (are they multiple single column predicates or is it a single set of cached results....) -- aka do we have to handle multiple ArrowPredicates (probably...)

Rationale for this change

I am working on not decoding predicate columns twice when evaluating filters in the reader

In #7513 we prototyped several APIs that we have now started making real (like BatchCoalescer) so I made a new PR that used those APIs which doesn't have hundreds of comments.

I am pleased with how it is looking now, and as before I don't really plan to merge this PR as is, I am using it as a design vehicle

What changes are included in this PR?

Add code to cache columns which are reused in filter and scan

Are there any user-facing changes?

Not yet,

alamb · 2025-06-24T12:00:47Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/cache_filter_result2 (025d411) to 2b40d1d diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_cache_filter_result2
Results will be posted here when complete

alamb · 2025-06-24T12:26:00Z

🤖: Benchmark completed

Details

group                                alamb_cache_filter_result2             main
-----                                --------------------------             ----
arrow_reader_clickbench/async/Q1     1.00      2.1±0.02ms        ? ?/sec    1.14      2.4±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.7±0.13ms        ? ?/sec    1.06     14.6±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     15.7±0.17ms        ? ?/sec    1.05     16.5±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     27.2±0.26ms        ? ?/sec    1.46     39.5±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     40.1±0.30ms        ? ?/sec    1.34     53.8±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     38.2±0.31ms        ? ?/sec    1.36     51.8±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.03      5.3±0.06ms        ? ?/sec    1.00      5.1±0.06ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    109.4±0.67ms        ? ?/sec    1.66    181.8±1.02ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.00    126.9±0.70ms        ? ?/sec    1.88    239.3±0.97ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    194.8±1.08ms        ? ?/sec    2.63    512.7±1.91ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    436.2±3.40ms        ? ?/sec    1.17   509.6±13.94ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     44.7±0.34ms        ? ?/sec    1.32     59.1±0.67ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    131.9±2.08ms        ? ?/sec    1.27    167.4±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00    125.1±7.17ms        ? ?/sec    1.33    165.9±0.71ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     65.6±0.49ms        ? ?/sec    1.00     65.8±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    129.1±0.68ms        ? ?/sec    1.33    171.2±1.34ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00    101.1±0.66ms        ? ?/sec    1.03    104.4±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.02     40.4±0.33ms        ? ?/sec    1.00     39.7±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     49.8±0.35ms        ? ?/sec    1.00     49.7±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     49.6±0.40ms        ? ?/sec    1.09     54.3±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.01     41.3±0.43ms        ? ?/sec    1.00     41.0±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.01     14.5±0.13ms        ? ?/sec    1.00     14.3±0.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00   1883.0±7.17µs        ? ?/sec    1.17      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     12.4±0.13ms        ? ?/sec    1.07     13.2±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     14.3±0.18ms        ? ?/sec    1.05     15.0±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     32.5±2.51ms        ? ?/sec    1.17     38.0±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     37.0±0.23ms        ? ?/sec    1.37     50.9±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     36.0±0.27ms        ? ?/sec    1.37     49.5±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.04      4.5±0.03ms        ? ?/sec    1.00      4.3±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00     97.6±0.51ms        ? ?/sec    1.84    179.6±0.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    113.9±0.68ms        ? ?/sec    2.11    240.4±1.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    174.7±1.12ms        ? ?/sec    2.83    495.4±2.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00    362.5±2.28ms        ? ?/sec    1.26   458.3±12.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     41.7±0.34ms        ? ?/sec    1.36     56.6±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    121.9±2.14ms        ? ?/sec    1.30    158.1±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    112.1±1.13ms        ? ?/sec    1.40    156.5±1.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     63.8±0.40ms        ? ?/sec    1.00     63.8±0.44ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    117.6±1.00ms        ? ?/sec    1.37    161.6±0.99ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     94.9±0.57ms        ? ?/sec    1.02     97.0±0.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.01     32.6±0.24ms        ? ?/sec    1.00     32.4±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.01     35.7±0.30ms        ? ?/sec    1.00     35.2±0.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     46.2±0.48ms        ? ?/sec    1.09     50.3±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.02     38.5±0.43ms        ? ?/sec    1.00     37.6±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.01     13.8±0.14ms        ? ?/sec    1.00     13.7±0.14ms        ? ?/sec

Dandandan · 2025-06-24T12:33:15Z

Nice, no regressions?

alamb · 2025-06-24T12:41:33Z

Nice, no regressions?

Seems like the coalesce batches work is paying off !

alamb · 2025-06-24T13:16:46Z

I have been thinking about memory usage as well as that will be a major factor in if we can cache the predicate results.

I annotated the Q22 with code to calculate memory usage of the cached results:

Cached predicate memory usage: 426344 bytes
That looks strong

Q22 (some of the best performance gains):

 arrow_reader_clickbench/async/Q22    1.00    194.8±1.08ms        ? ?/sec    2.63    512.7±1.91ms        ? ?/sec

        // Q22: SELECT "SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, COUNT(DISTINCT "UserID") FROM hits WHERE "Title" LIKE '%Google%' AND "URL" NOT LIKE '%.google.%' AND "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;
        Query {
            name: "Q22",
            filter_columns: vec!["Title", "URL", "SearchPhrase"],
            projection_columns: vec!["SearchPhrase", "URL", "Title", "UserID"],
            predicates: vec![
                ClickBenchPredicate::like_Google(0),
                ClickBenchPredicate::nlike_google(1),
                ClickBenchPredicate::not_empty(2),
            ],
            expected_row_count: 46,
        },

for hits_1.parquet, the data sizes are:

Title: 0.46MB (row group 0), 1.0MB (row group 1), 71.97MB (row group 2)
URL: 4.93MB (row group 0), 43.63MB (row group 1), 40.19MB (row group 2)
SearchPhrase: 0.03MB (row group 0), 0.11MB (row group 1), 8.6MB (row group 2)

(I totally used @XiangpengHao 's https://parquet-viewer.xiangpeng.systems/ for this analysis)

I will try and add some additional debugging / annotation code to see what the peak memory usage was (and if I limit it to 1MB if that will get triggered for any query)

alamb · 2025-06-24T14:11:52Z

Q22: Cached predicate result size: 827752
Q22: Cached predicate result size: 442728

Seems reasonable to me. I need to think about memory handling a bit more carefully now

POC: Parquet predicate results cache

025d411

github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Jun 24, 2025

alamb mentioned this pull request Jun 24, 2025

POC: Sketch out parquet cached filter result API #7513

Closed

3 tasks

alamb force-pushed the alamb/cache_filter_result2 branch from d236925 to 025d411 Compare June 24, 2025 12:42

fmt

81db293

alamb mentioned this pull request Jun 26, 2025

POC: Test DataFusion with experimental Parquet Filter Pushdown (try 2) apache/datafusion#16562

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

POC: Parquet predicate results cache #7760

POC: Parquet predicate results cache #7760

Uh oh!

alamb commented Jun 24, 2025 •

edited

Loading

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

Dandandan commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

Uh oh!

POC: Parquet predicate results cache #7760

Are you sure you want to change the base?

POC: Parquet predicate results cache #7760

Uh oh!

Conversation

alamb commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

Dandandan commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

alamb commented Jun 24, 2025

Uh oh!

Uh oh!

alamb commented Jun 24, 2025 •

edited

Loading