Skip to content

[cuDF] Improve coverage and performance of expression evaluation #14149

@GregoryKimball

Description

@GregoryKimball

Description

Is your feature request related to a problem? Please describe.

The cuDF backend for Velox uses an ExpressionEvaluator to process query operators like FilterProject.

Internally, the expression evaluation maps the predicate in a core::FilterNode to a cudf::ast::expression. However, cudf::ast::expression only implements a subset of the expression operators supported by core::FilterNode. For instance, strings manipulation and ternary operator are not supported in cudf::ast::expression.

As of July 2025, ExpressionEvaluator has the capability to precompute columns if an unsupported operator occurs in a leaf node of the expression tree. However, there are gaps where precomputing isn't yet supported and when an AST-incompatible node occurs in a branch node.

Describe the solution you'd like

Project Summary Status
Add support for precomputing for non-leaf nodes We can expand the precompute+AST path to cover non-leaf nodes to grow the subset of predicates we can process. Multiple passes of precompute+AST are not supported 🔄 #14462
Add expression evaluator that uses cuDF APIs without AST Implement an expression evaluator that traverses the expression tree and calls one or more cuDF APIs per operator. This approach will be simpler and more feature-complete than using AST and non-AST APIs together as in the design as of July 2025. For AST-compatible expressions, we should expect that cuDF API per operator will be slower than the AST-based cudf::compute_column
Add option to use cuDF JIT transform for evaluating compatible expressions JIT transform in cuDF supports arithmetic, comparisons and several string manipulation functions (link). We should provide tools in ExpressionEvaluator to detect if an expression is compatible with JIT transform, perform the code gen to convert the expression into a CUDA C++ UDF, and then execute the transform.
Explore design for Velox's Expression Evaluator to use cuDF operators and operands Depending on the design of Velox's expression evaluator and the details of the "expression compilation" step, we may be able to plug cuDF operators into Velox expressions. This approach would give us many benefits from Velox's expression compilation process, including common subexpression elimination, flattening of adjacent expressions, and constant folding.
Add a function registry pattern Evaluator functions like pushExprToTree (link) contain large switch statements. We should prefer a design that improves how we map between expression names and cuDF functions. Potentially, we could use a cuDF function registry to build interoperability between Velox UDFs and cuDF UDFs.
Improve dispatch capability in ExpressionEvaluator Once we have working evaluators based on (1) cuDF AST with precompute, (2) cuDF API per operator and (3) JIT transform, we will need a dispatch layer to choose which evaluator to use. The dispatch would be based on compatibility, performance and configuration choices.
Make JIT Filters available in Velox-cuDF In the FilterProject operator, we should make the cuDF JIT filter available as an option (rapidsai/cudf#19070). We will also need a dispatch check in FilterProject to assess if the predicate is JIT-compatible and if the application has JIT enabled.

Additional context
cuDF has three options for expression evaluation:

  • cuDF API per operator, using precompiled cuDF public APIs. Broad feature coverage including strings LIKE, regex, nested types.
  • AST execution in a single kernel, precompiled but no string manipulation and no nested types
  • JIT-compiled transform, supports string manipulation such as contains, slicing. No nested types or advanced string processing (JSON, regex) yet.

Spark-RAPIDS only uses the cuDF API per operator approach for expression evaluation. cuDF AST execution tends to be much faster than the cuDF API per operator approach. JIT transform shows the highest throughput when the JIT compilation time is managed.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions