Skip to content

[SPARK-50762][SQL][TESTS] Add more scalar SQL UDF SQL query tests #50898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

allisonwang-db
Copy link
Contributor

@allisonwang-db allisonwang-db commented May 15, 2025

What changes were proposed in this pull request?

This PR adds more SQL query tests for scalar SQL UDFs.

Why are the changes needed?

To make sure SQL UDF works with different operators and prevent regressions.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Test only

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label May 15, 2025
-- !query schema
struct<>
-- !query output
org.apache.spark.sql.catalyst.analysis.FunctionAlreadyExistsException
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs investigation. Created a follow up ticket: SPARK-52148

@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

@allisonwang-db can you rebase your branch and regenerate the golden files? the test fails

[info] - sql-udf.sql *** FAILED *** (6 seconds, 861 milliseconds)
[info]   sql-udf.sql
[info]   Expected "...s" : "\"sum(c2) AS `[outer(sum(]c2))`\""
[info]     },
[info]     "que...", but got "...s" : "\"sum(c2) AS `[sum(outer(spark_catalog.default.t1.]c2))`\""
[info]     },
[info]     "que..." Result did not match for query #159
[info]   SELECT c1, SUM(c2) + foo3_1a(MIN(c2), MAX(c2)) + (SELECT SUM(c2)) FROM t1 GROUP BY c1 (SQLQueryTestSuite.scala:683)

@xinrong-meng
Copy link
Member

Failed test org.apache.spark.sql.kafka010.KafkaContinuousSourceTopicDeletionSuite seemed irrelevant, would you please retrigger?

@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 458cf70 May 16, 2025
cloud-fan pushed a commit that referenced this pull request May 16, 2025
### What changes were proposed in this pull request?

This PR adds more SQL query tests for scalar SQL UDFs.

### Why are the changes needed?

To make sure SQL UDF works with different operators and prevent regressions.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #50898 from allisonwang-db/spark-50762-tests.

Authored-by: Allison Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 458cf70)
Signed-off-by: Wenchen Fan <[email protected]>
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 16, 2025

Hi, @allisonwang-db and @cloud-fan .

This seems to break branch-4.0.

- sql-udf.sql_analyzer_test *** FAILED ***
  sql-udf.sql_analyzer_test
  Expected "...g.default.foo3_1b(c2[))#xL > cast(0 as bigint))
     +- Project [c1#x, count(1)#xL, spark_catalog.default.foo3_1b(x#x) AS spark_catalog.default.foo3_1b(sum(c2))#x, sum(spark_catalog.default.foo3_1b(c2))#xL]
        +- Project [c1#x, count(1)#xL, sum(c2)#xL, sum(spark_catalog.default.foo3_1b(c2))#xL, cast(sum(c2)#xL as int) AS x#x]
           +- Aggregate [c1#x], [c1#x, count(1) AS count(1)#xL, sum(c2#x) AS sum(c2)#xL, sum(spark_catalog.default.foo3_1b(x#x)) AS sum(spark_catalog.default.foo3_1b(c2]))#xL]
              +...", but got "...g.default.foo3_1b(c2[#x))#xL > cast(0 as bigint))
     +- Project [c1#x, count(1)#xL, spark_catalog.default.foo3_1b(x#x) AS spark_catalog.default.foo3_1b(sum(c2))#x, sum(spark_catalog.default.foo3_1b(c2#x))#xL]
        +- Project [c1#x, count(1)#xL, sum(c2)#xL, sum(spark_catalog.default.foo3_1b(c2#x))#xL, cast(sum(c2)#xL as int) AS x#x]
           +- Aggregate [c1#x], [c1#x, count(1) AS count(1)#xL, sum(c2#x) AS sum(c2)#xL, sum(spark_catalog.default.foo3_1b(x#x)) AS sum(spark_catalog.default.foo3_1b(c2#x]))#xL]
              +..." Result did not match for query #152
  SELECT c1, COUNT(*), foo3_1b(SUM(c2)) FROM t1 GROUP BY c1 HAVING SUM(foo3_1b(c2)) > 0 (SQLQueryTestSuite.scala:683)

@dongjoon-hyun
Copy link
Member

I created a follow-up for branch-4.0.

dongjoon-hyun added a commit that referenced this pull request May 17, 2025
### What changes were proposed in this pull request?

This is a follow-up of #50898 for branch-4.0.
- #50898

### Why are the changes needed?

#50898 broke `branch-4.0` CIs.
- https://github.com/apache/spark/actions/runs/15070364465/job/42364916342
- https://github.com/apache/spark/actions/runs/15070303045/job/42364700177
- https://github.com/apache/spark/actions/runs/15070364465/job/42364916342

```
- sql-udf.sql_analyzer_test *** FAILED ***
  sql-udf.sql_analyzer_test
  Expected "...g.default.foo3_1b(c2[))#xL > cast(0 as bigint))
     +- Project [c1#x, count(1)#xL, spark_catalog.default.foo3_1b(x#x) AS spark_catalog.default.foo3_1b(sum(c2))#x, sum(spark_catalog.default.foo3_1b(c2))#xL]
        +- Project [c1#x, count(1)#xL, sum(c2)#xL, sum(spark_catalog.default.foo3_1b(c2))#xL, cast(sum(c2)#xL as int) AS x#x]
           +- Aggregate [c1#x], [c1#x, count(1) AS count(1)#xL, sum(c2#x) AS sum(c2)#xL, sum(spark_catalog.default.foo3_1b(x#x)) AS sum(spark_catalog.default.foo3_1b(c2]))#xL]
              +...", but got "...g.default.foo3_1b(c2[#x))#xL > cast(0 as bigint))
     +- Project [c1#x, count(1)#xL, spark_catalog.default.foo3_1b(x#x) AS spark_catalog.default.foo3_1b(sum(c2))#x, sum(spark_catalog.default.foo3_1b(c2#x))#xL]
        +- Project [c1#x, count(1)#xL, sum(c2)#xL, sum(spark_catalog.default.foo3_1b(c2#x))#xL, cast(sum(c2)#xL as int) AS x#x]
           +- Aggregate [c1#x], [c1#x, count(1) AS count(1)#xL, sum(c2#x) AS sum(c2)#xL, sum(spark_catalog.default.foo3_1b(x#x)) AS sum(spark_catalog.default.foo3_1b(c2#x]))#xL]
              +..." Result did not match for query #152
  SELECT c1, COUNT(*), foo3_1b(SUM(c2)) FROM t1 GROUP BY c1 HAVING SUM(foo3_1b(c2)) > 0 (SQLQueryTestSuite.scala:683)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #50928 from dongjoon-hyun/SPARK-50762.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
### What changes were proposed in this pull request?

This PR adds more SQL query tests for scalar SQL UDFs.

### Why are the changes needed?

To make sure SQL UDF works with different operators and prevent regressions.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Test only

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50898 from allisonwang-db/spark-50762-tests.

Authored-by: Allison Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants