[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727

heyihong · 2025-07-30T15:24:09Z

What changes were proposed in this pull request?

This PR refactors the Spark Connect execution flow to make WriteOperation handling side-effect free by separating the transformation and execution phases. The key changes include:

Unified execution flow: Consolidated ROOT and COMMAND operations through SparkConnectPlanExecution.handlePlan() instead of separate handlers
Pure transformation phase: Introduced transformCommand() that converts WriteOperation to LogicalPlan without side effects. It leverages the new DataFrameWriter methods (saveCommand(), saveAsTableCommand(), insertIntoCommand()), which return logical plans instead of executing immediately.
DataFrameWriter refactoring: The refactor adds new DataFrameWriter methods—saveCommand(), saveAsTableCommand(), and insertIntoCommand()—that return logical plans, and it introduces a new SaveAsV1TableCommand.

Why are the changes needed?

The current implementation has several issues:

Side effects in transformation: The handleWriteOperation method both transforms and executes write operations, making it difficult to reason about the transformation logic independently.
Code duplication: Separate handling paths for ROOT and COMMAND operations in ExecuteThreadRunner create unnecessary complexity and potential inconsistencies.

Does this PR introduce any user-facing change?

No. This is a purely internal refactoring that maintains the same external behavior and API. All existing Spark Connect client code will continue to work without any changes.

How was this patch tested?

build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"

Was this patch authored or co-authored using generative AI tooling?

Cursor 1.3.5

heyihong · 2025-07-30T16:36:11Z

@HyukjinKwon @cloud-fan @hvanhovell

cloud-fan · 2025-07-31T06:04:24Z

sql/core/src/main/scala/org/apache/spark/sql/classic/DataFrameWriter.scala

+
+        val refreshTablePlan = RefreshTableCommand(qualifiedIdent)
+
+        CompoundBody(


Does this really work with runCommand?

Yes, CompoundBody is actually executed during the analysis phase. But the tracker doesn't seem to get updated correctly. So I made a small fix in this pr

I'm a bit worried about using CompoundBody outside of SQL script execution, and the fix in https://github.com/apache/spark/pull/51727/files#diff-a3fb0a56a7d2d08dc87434eff5b43aba0b006b59ed2f25e29bfb6fb4f81ec0c4R164 is a bit suspicious.

Can we create a new command SaveAsTableCommand to do these operations?

cloud-fan · 2025-07-31T06:05:30Z

The idea LGTM, we can simplify it futher in the future by using a simple logical plan for each DataFrameWriter API. Then Spark Connect can just use that logical plan, instead of calling DataFrameWriter to generate the logical plan.

…e effect free

github-actions bot added SQL CONNECT labels Jul 30, 2025

heyihong force-pushed the SPARK-52451 branch 2 times, most recently from 73aacd0 to b510901 Compare July 30, 2025 16:28

heyihong changed the title ~~[SPARK-52451][CONNECT] Make WriteOperation in SparkConnectPlanner side effect free~~ [SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free Jul 30, 2025

heyihong force-pushed the SPARK-52451 branch 2 times, most recently from 82c9d71 to 648ff88 Compare July 30, 2025 21:01

cloud-fan reviewed Jul 31, 2025

View reviewed changes

heyihong force-pushed the SPARK-52451 branch from 648ff88 to 3e1d577 Compare July 31, 2025 13:46

heyihong requested a review from cloud-fan July 31, 2025 13:49

[SPARK-52451][CONNECT] Make WriteOperation in SparkConnectPlanner sid…

2fb4e60

…e effect free

heyihong force-pushed the SPARK-52451 branch 3 times, most recently from 563d5d6 to d20692d Compare August 1, 2025 14:23

Address comments

a0eb8ed

heyihong force-pushed the SPARK-52451 branch from f0febe6 to a0eb8ed Compare August 1, 2025 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727

[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727

heyihong commented Jul 30, 2025 •

edited

Loading

Uh oh!

heyihong commented Jul 30, 2025

Uh oh!

cloud-fan Jul 31, 2025

Uh oh!

heyihong Jul 31, 2025 •

edited

Loading

Uh oh!

cloud-fan Aug 1, 2025

Uh oh!

cloud-fan commented Jul 31, 2025

Uh oh!

Uh oh!


		val refreshTablePlan = RefreshTableCommand(qualifiedIdent)

		CompoundBody(

[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727

Are you sure you want to change the base?

[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727

Conversation

heyihong commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

heyihong commented Jul 30, 2025

Uh oh!

cloud-fan Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

heyihong Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 31, 2025

Uh oh!

Uh oh!

heyihong commented Jul 30, 2025 •

edited

Loading

heyihong Jul 31, 2025 •

edited

Loading