Skip to content

Add support for Spark Overwrite save mode#21

Open
MaxKsyunz wants to merge 8 commits intointeg/overwrite_modefrom
dev/overwrite_mode_v2
Open

Add support for Spark Overwrite save mode#21
MaxKsyunz wants to merge 8 commits intointeg/overwrite_modefrom
dev/overwrite_mode_v2

Conversation

@MaxKsyunz
Copy link
Copy Markdown

Done in two steps:

  • Truncate the destination table using DELETE FROM <TABLE> WHERE TRUE executed as a partitioned DML statment
  • Inserting new rows into the table.

…itioned DML.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
spotless:apply was not fixing error that spotless:check was reporting.
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
@MaxKsyunz
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Spark's "Overwrite" save mode by implementing "SupportsTruncate", utilizing a partitioned DML "DELETE" statement for efficient table truncation in Spanner. The refactoring in SpannerUtils to expose buildSpannerOptions also improves code reuse. However, a high-severity SQL injection vulnerability exists in the truncateTable method within SpannerWriteBuilder due to unsafe string concatenation of the table name. Additionally, the use of System.out and System.err for logging in SpannerWriteBuilder should be replaced with the standard SLF4J framework for better logging control.

…nner/SpannerWriteBuilder.java

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@MaxKsyunz
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Spark's Overwrite save mode by truncating the table before writing. This is implemented by adding SupportsTruncate to SpannerWriteBuilder and using a partitioned DML statement (DELETE FROM ... WHERE true) to clear the table. The changes are logical and well-implemented. My main feedback is to replace System.out and System.err with a proper logging framework like SLF4J, which is crucial for applications running on a distributed platform like Spark.

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
… "recreate"

Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
@MaxKsyunz
Copy link
Copy Markdown
Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Spark's Overwrite save mode, with both truncate and recreate strategies. However, the implementation of table recreation and truncation is vulnerable to SQL injection, as SpannerSchemaConverter and SpannerWriteBuilder construct DDL and DML statements by concatenating user-provided table and column names without proper escaping of identifier quotes. This could allow an attacker to execute arbitrary SQL commands. Additionally, there are critical issues related to database dialect handling, where DDL and DML statements use hardcoded backticks () for quoting, which is incorrect for PostgreSQL. A logic flaw in SpannerSchemaConvertercould also lead to incorrect type mapping for PostgreSQL. It is also recommended to add a test case for the defaulttruncate` overwrite mode.

assertEquals(2L, finalRow.getLong(0));
assertEquals("two", finalRow.getString(1));
assertEquals(2.2, finalRow.getDouble(2), 0.0001);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new test testOverwriteRecreateMode is great for the recreate mode. However, a test case for the default Overwrite mode (which uses truncate) is missing. Adding a test for this scenario would improve test coverage and could have helped catch the dialect-specific quoting issues in the truncateTable implementation. Please consider adding a testOverwriteTruncateMode test that runs for both dialects.

…nner/SpannerWriteBuilder.java


Temporary code. will be merged with catalog branch.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant