Optimize regex_replace for scalar patterns
#3614
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3613.
Rationale for this change
@Dandandan noticed
regex_replacewith a known pattern seems to be taking an extremely long amount of time during ClickBench suite in #3518. This seems to be true due to many factors, but mainly due to how genericregex_replaceimplementation is (it can handle 2⁴ combinations when it comes to scalars/arrays). Having a generic version ready is good for compatibility, but at the same time, it makes us pay the overhead for common cases (like the example in #3518, where the pattern is static).What changes are included in this PR?
This PR adds a scalarity (not sure if this is a real word) based specialization system where at the runtime the best
regex_replacevariation can be picked and executed for the given set of inputs. The system here is just the start, and if there is enough gains we might add a third case where the replacement is also known.Are there any user-facing changes?
This is mainly an optimization, and there shouldn't be any user facing changes.
Benchmarks
New benchmarks are here #3614 (comment), and overall it shows a speed-up in the range of 20-35X depending on the query & input.
Old benchmarks
Running all benchmarks with
--releasemode (using the datafusion-cli crate with-foption).The initial benchmark is the Query 28 from clickhouse
(Note: I don't have the full ClickBench data, just have a partition of it [1/100 scale] so this might not be very reflective)
A second benchmark is the one where we have both the source and the replacements as arrays, which shows speed-up factor of 1.7X.