Add support for Spark Overwrite save mode#21
Add support for Spark Overwrite save mode#21MaxKsyunz wants to merge 8 commits intointeg/overwrite_modefrom
Conversation
…itioned DML. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
spotless:apply was not fixing error that spotless:check was reporting. Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for Spark's "Overwrite" save mode by implementing "SupportsTruncate", utilizing a partitioned DML "DELETE" statement for efficient table truncation in Spanner. The refactoring in SpannerUtils to expose buildSpannerOptions also improves code reuse. However, a high-severity SQL injection vulnerability exists in the truncateTable method within SpannerWriteBuilder due to unsafe string concatenation of the table name. Additionally, the use of System.out and System.err for logging in SpannerWriteBuilder should be replaced with the standard SLF4J framework for better logging control.
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Outdated
Show resolved
Hide resolved
…nner/SpannerWriteBuilder.java Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for Spark's Overwrite save mode by truncating the table before writing. This is implemented by adding SupportsTruncate to SpannerWriteBuilder and using a partitioned DML statement (DELETE FROM ... WHERE true) to clear the table. The changes are logical and well-implemented. My main feedback is to replace System.out and System.err with a proper logging framework like SLF4J, which is crucial for applications running on a distributed platform like Spark.
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
… "recreate" Signed-off-by: Max Ksyunz <max.ksyunz@improving.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request adds support for Spark's Overwrite save mode, with both truncate and recreate strategies. However, the implementation of table recreation and truncation is vulnerable to SQL injection, as SpannerSchemaConverter and SpannerWriteBuilder construct DDL and DML statements by concatenating user-provided table and column names without proper escaping of identifier quotes. This could allow an attacker to execute arbitrary SQL commands. Additionally, there are critical issues related to database dialect handling, where DDL and DML statements use hardcoded backticks () for quoting, which is incorrect for PostgreSQL. A logic flaw in SpannerSchemaConvertercould also lead to incorrect type mapping for PostgreSQL. It is also recommended to add a test case for the defaulttruncate` overwrite mode.
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSchemaConverter.java
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSchemaConverter.java
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerWriteBuilder.java
Show resolved
Hide resolved
| assertEquals(2L, finalRow.getLong(0)); | ||
| assertEquals("two", finalRow.getString(1)); | ||
| assertEquals(2.2, finalRow.getDouble(2), 0.0001); | ||
| } |
There was a problem hiding this comment.
The new test testOverwriteRecreateMode is great for the recreate mode. However, a test case for the default Overwrite mode (which uses truncate) is missing. Adding a test for this scenario would improve test coverage and could have helped catch the dialect-specific quoting issues in the truncateTable implementation. Please consider adding a testOverwriteTruncateMode test that runs for both dialects.
…nner/SpannerWriteBuilder.java Temporary code. will be merged with catalog branch. Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Done in two steps:
DELETE FROM <TABLE> WHERE TRUEexecuted as a partitioned DML statment