feat(sparksql): Add transform with index parameter support by yaooqinn · Pull Request #16211 · facebookincubator/velox

yaooqinn · 2026-02-03T04:12:13Z

Summary

Adds Spark-specific transform function that supports the 2-arg lambda signature transform(array, (x, i) -> expr) with element + index.

Background

Spark's transform function supports two signatures:

transform(array, x -> expr) - element only
transform(array, (x, i) -> expr) - element + index

Velox's Presto implementation only supports signature #1. This PR adds a Spark-specific implementation that supports both.

Changes

Created velox/functions/sparksql/Transform.cpp with SparkTransformFunction
Uses runtime arity detection to provide index vector when needed
Uses 32-bit integer for index (matching Spark's IntegerType)
Added 6 test cases covering both signatures and null handling

Test Plan

-- Spark behavior this enables:
spark-sql> SELECT transform(array(1, 2, 3), (x, i) -> x + i);
[1, 3, 5]

All 6 new tests pass:

elementOnly - basic transform without index
elementWithIndex - transform using element + index
withNulls - null handling in arrays
nestedArrays - nested array transformation
indexOnly - using only index (ignore element)
emptyArray - empty array edge case

netlify · 2026-02-03T04:12:20Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`c28d2c1`
🔍 Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/6981ce81070e0e0008342e56

Spark's transform function supports two signatures: 1. transform(array, x -> expr) - element only 2. transform(array, (x, i) -> expr) - element + index Velox's Presto implementation only supports signature #1. This adds a Spark-specific implementation that supports both signatures. Key changes: - Created velox/functions/sparksql/Transform.cpp with SparkTransformFunction that detects lambda arity at runtime and provides index vector when needed - Uses 32-bit integer for index (matching Spark's IntegerType) - Added comprehensive tests for both signatures including null handling Tested with 6 new test cases, all passing.

jinchengchenghh

Thanks for your feature. Please also update the document.

jinchengchenghh · 2026-02-03T06:50:39Z

velox/functions/sparksql/tests/TransformTest.cpp

+  }
+};
+
+// Test transform with element-only lambda: transform(array, x -> x * 2)


End with .

jinchengchenghh · 2026-02-03T06:52:04Z

velox/functions/sparksql/Transform.cpp

+namespace facebook::velox::functions::sparksql {
+namespace {
+
+/// Spark's transform function supporting both signatures:


supporting -> supports

jinchengchenghh · 2026-02-03T06:52:35Z

velox/functions/sparksql/Transform.cpp

+///
+/// See Spark documentation:
+/// https://spark.apache.org/docs/latest/api/sql/index.html#transform
+class SparkTransformFunction : public exec::VectorFunction {


Please remove Spark prefix since it has been in sparksql namespace

jinchengchenghh · 2026-02-03T06:54:57Z

velox/functions/sparksql/Transform.cpp

+/// https://spark.apache.org/docs/latest/api/sql/index.html#transform
+class SparkTransformFunction : public exec::VectorFunction {
+ public:
+  void apply(


Could we reuse some code of udf_transform, move some of them to common path velox/functions/lib?

- Rename SparkTransformFunction to TransformFunction (in sparksql namespace) - Fix grammar: 'supporting' -> 'supports' - Add period at end of test comment

yaooqinn · 2026-02-03T08:13:36Z

I analyzed the code reuse opportunity. The common utilities are already shared in velox/functions/lib/:

flattenArray() from LambdaFunctionUtil.h
toElementRows(), getElementToTopLevelRows() from RowsTranslationUtil.h
toWrapCapture(), addNullsForUnselectedRows() from LambdaFunctionUtil.h

The Spark-specific additions are:

Index detection via functionType->size() == 3
createIndexVector() helper for the (x, i) signature

Moving the remaining lambda iteration loop to lib would add complexity without significant benefit since it's ~15 lines of straightforward code. The current structure follows the same pattern as other Spark-specific lambda functions like filter.

If you'd prefer a different approach (e.g., a templated base class), I'm happy to refactor.

jinchengchenghh · 2026-02-03T08:40:26Z

velox/functions/sparksql/Transform.cpp

+    std::vector<VectorPtr> lambdaArgs = {flatArray->elements()};
+
+    // If lambda expects index, create index vector.
+    if (withIndex) {


Looks like the only change with presto TransformFunction is here, am I right? So I would prefer a base TransformFunction and add a virtual addIndexVectors(lambdaArgs) with default empty implementation,

jinchengchenghh · 2026-02-03T08:41:16Z

velox/functions/sparksql/Transform.cpp

+    context.moveOrCopyResult(localResult, rows, result);
+  }
+
+  static std::vector<std::shared_ptr<exec::FunctionSignature>> signatures() {


And the signatures, call the super function signatures() and add the spark specified function signature

Address reviewer feedback to share code between Presto and Spark transform: - Create TransformFunctionBase in velox/functions/lib/ with virtual addIndexVector() - Presto TransformFunction inherits base directly (no override needed) - Spark TransformFunction overrides addIndexVector() to add index parameter - Spark signatures() calls base and adds the (T, integer, U) signature All 13 transform tests pass (6 Spark + 7 Presto).

yaooqinn requested review from jinchengchenghh, rui-mo and zhli1142015 as code owners February 3, 2026 04:12

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2026

yaooqinn changed the title ~~Add Spark-specific transform with index parameter support~~ feat(sparksql): Add transform with index parameter support Feb 3, 2026

yaooqinn force-pushed the spark-transform-with-index branch from 878130c to 7254aa4 Compare February 3, 2026 04:34

yaooqinn force-pushed the spark-transform-with-index branch from 7254aa4 to 0b50680 Compare February 3, 2026 06:31

jinchengchenghh reviewed Feb 3, 2026

View reviewed changes

Address review comments: rename class, fix grammar

7656e89

- Rename SparkTransformFunction to TransformFunction (in sparksql namespace) - Fix grammar: 'supporting' -> 'supports' - Add period at end of test comment

jinchengchenghh reviewed Feb 3, 2026

View reviewed changes

yaooqinn force-pushed the spark-transform-with-index branch from cc7afe8 to c28d2c1 Compare February 3, 2026 10:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sparksql): Add transform with index parameter support#16211

feat(sparksql): Add transform with index parameter support#16211
yaooqinn wants to merge 3 commits intofacebookincubator:mainfrom
yaooqinn:spark-transform-with-index

yaooqinn commented Feb 3, 2026

Uh oh!

netlify bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

jinchengchenghh left a comment

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

yaooqinn commented Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yaooqinn commented Feb 3, 2026

Summary

Background

Changes

Test Plan

Uh oh!

netlify bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

jinchengchenghh left a comment

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Feb 3, 2026

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Feb 3, 2026 •

edited

Loading