-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52451][CONNECT][SQL] Make WriteOperation in SparkConnectPlanner side effect free #51727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
73aacd0
to
b510901
Compare
82c9d71
to
648ff88
Compare
|
||
val refreshTablePlan = RefreshTableCommand(qualifiedIdent) | ||
|
||
CompoundBody( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really work with runCommand
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CompoundBody is actually executed during the analysis phase. But the tracker doesn't seem to get updated correctly. So I made a small fix in this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about using CompoundBody
outside of SQL script execution, and the fix in https://github.com/apache/spark/pull/51727/files#diff-a3fb0a56a7d2d08dc87434eff5b43aba0b006b59ed2f25e29bfb6fb4f81ec0c4R164 is a bit suspicious.
Can we create a new command SaveAsTableCommand
to do these operations?
The idea LGTM, we can simplify it futher in the future by using a simple logical plan for each |
563d5d6
to
d20692d
Compare
What changes were proposed in this pull request?
This PR refactors the Spark Connect execution flow to make
WriteOperation
handling side-effect free by separating the transformation and execution phases. The key changes include:ROOT
andCOMMAND
operations throughSparkConnectPlanExecution.handlePlan()
instead of separate handlerstransformCommand()
that convertsWriteOperation
toLogicalPlan
without side effects. It leverages the new DataFrameWriter methods (saveCommand(), saveAsTableCommand(), insertIntoCommand()), which return logical plans instead of executing immediately.Why are the changes needed?
The current implementation has several issues:
Side effects in transformation: The
handleWriteOperation
method both transforms and executes write operations, making it difficult to reason about the transformation logic independently.Code duplication: Separate handling paths for
ROOT
andCOMMAND
operations inExecuteThreadRunner
create unnecessary complexity and potential inconsistencies.Does this PR introduce any user-facing change?
No. This is a purely internal refactoring that maintains the same external behavior and API. All existing Spark Connect client code will continue to work without any changes.
How was this patch tested?
build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"
Was this patch authored or co-authored using generative AI tooling?
Cursor 1.3.5