Skip to content
This repository was archived by the owner on May 9, 2024. It is now read-only.

[CSV parsing] Empty columns support. #173

Merged
merged 2 commits into from
Feb 14, 2023
Merged

Conversation

Devjiu
Copy link
Contributor

@Devjiu Devjiu commented Jan 23, 2023

This commit support empty column from csv parsing.

Signed-off-by: Dmitrii Makarenko [email protected]

@Devjiu Devjiu added the Draft label Jan 23, 2023
@Devjiu Devjiu self-assigned this Jan 23, 2023
@Devjiu Devjiu requested a review from ienkovich January 23, 2023 18:59
@alexbaden
Copy link
Contributor

What happens if the schema is explicitly not null, e.g.:

create table t (x int not null, y int, z int);

and then import your test CSV file.

@Devjiu Devjiu marked this pull request as draft January 23, 2023 21:17
@Devjiu Devjiu removed the Draft label Jan 23, 2023
@ienkovich
Copy link
Contributor

What happens if the schema is explicitly not null, e.g.:

create table t (x int not null, y int, z int);

and then import your test CSV file.

I'm not sure we really check for NULL values when importing into a non-nullable column. Worth testing.

@Devjiu Devjiu force-pushed the dmitriim/support_empty_column branch 6 times, most recently from 0280fb3 to 5c49e20 Compare January 30, 2023 19:16
@Devjiu Devjiu force-pushed the dmitriim/support_empty_column branch 4 times, most recently from 4da8746 to 6d89706 Compare February 3, 2023 16:40
@Devjiu
Copy link
Contributor Author

Devjiu commented Feb 3, 2023

What happens if the schema is explicitly not null, e.g.:

create table t (x int not null, y int, z int);

and then import your test CSV file.

Added check, that should cover this case, also test "ImportCsv_KnownSchema_NullsColumn_NullableTypeHeader" describes similar scenario.

@Devjiu Devjiu marked this pull request as ready for review February 3, 2023 16:43
Copy link
Contributor

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the patch looks good. I have several minor comments about unrelated changes, styling, etc.

@Devjiu Devjiu force-pushed the dmitriim/support_empty_column branch 2 times, most recently from b835219 to 3099194 Compare February 7, 2023 15:52
@Devjiu Devjiu requested a review from kurapov-peter February 7, 2023 15:53
Copy link
Contributor

@ienkovich ienkovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to go! You might want to rebase it first, CI should be green if rebased on the current main branch.

@Devjiu
Copy link
Contributor Author

Devjiu commented Feb 8, 2023

PR looks good to go! You might want to rebase it first, CI should be green if rebased on the current main branch.

Hm, looks like there was Null on non-nullable type.

[  FAILED  ] Select.GroupEmptyBlank
C++ exception with description "Null values used in non-nullable type: DICT32(TEXT[NN])[16777272]" thrown in the test body

@ienkovich
Copy link
Contributor

Hm, looks like there was Null on non-nullable type.

That's probably a sign to leave the null-check part for another PR. I don't think that in this test we actually have imported NULL value. The test is for an empty string and the reference value has no NULLs. Also, changing the type to allow NULLS in the test table causes the test to fail. So, in this case, NULL is probably interpreted as an empty string on import/execution/conversion.

This commit support empty column from csv parsing.

Signed-off-by: Dmitrii Makarenko <[email protected]>
@Devjiu Devjiu force-pushed the dmitriim/support_empty_column branch 2 times, most recently from 4046f8f to c280c3a Compare February 13, 2023 10:52
This commit adds tests for null columns and inserts verification that
non-nullable types will not have nulls.

Signed-off-by: Dmitrii Makarenko <[email protected]>
@Devjiu Devjiu force-pushed the dmitriim/support_empty_column branch from c280c3a to 585ec8d Compare February 13, 2023 11:23
@Devjiu Devjiu requested a review from alexbaden February 13, 2023 11:24
@Devjiu
Copy link
Contributor Author

Devjiu commented Feb 13, 2023

@alexbaden PTAL your notice (verification for nulls depending on schema) not covered fully. String conversion to null looks a bit difficult to support in current PR. Should we crate an issue with this and remove this WA in further PR? Or is current WA fine?

Copy link
Contributor

@alexbaden alexbaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create an issue and handle later. The empty handling here looks good.

@ienkovich ienkovich merged commit 0e84174 into main Feb 14, 2023
@ienkovich ienkovich deleted the dmitriim/support_empty_column branch February 14, 2023 16:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants