-
Notifications
You must be signed in to change notification settings - Fork 173
feat(c/driver/postgresql): Timestamp write support #861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c/validation/adbc_validation_util.cc
Outdated
| case NANOARROW_TYPE_TIMESTAMP: | ||
| // TODO: don't hardcode unit here | ||
| CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type, | ||
| NANOARROW_TIME_UNIT_MICRO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SchemaFields are an invention of the adbc_validation_util header - is there a reason we use this custom struct instead of an ArrowSchemaView? I'm not sure if its worth adding time_unit to the former or porting over to the latter in the test suite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I originally just wanted something that could be easily stack allocated, versus a nested heap allocated structure. I was also struggling with how to deal with validation of more complex types. Especially in the validation code where you may not always want full exact matches (maybe you don't care about the field name, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK for now but I think what I might end up doing is writing up a set of Googletest matchers so we can write out code like
ASSERT_THAT(view->children[0], IsTimestamp(/*any unit*/));
ASSERT_THAT(view->children[0], IsTimestamp(NANOARROW_TIME_UNIT_MICRO));or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for here...you could also just write out the code to build the schema by hand? This was meant as a convenience to cut down on boilerplate for simple schemas. But it was from a very early nanoarrow version, and nowadays I'm not sure how much boilerplate it actually saves compared to just writing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense for matching, though I think we still have a gap with that when creating test data in non-default precisions that needs to be solved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK cool - will try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry, I realized I had gotten the context completely wrong once I clicked through and started reviewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change still needed?
| case ArrowType::NANOARROW_TYPE_BINARY: | ||
| create += " BYTEA"; | ||
| break; | ||
| case ArrowType::NANOARROW_TYPE_TIMESTAMP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably differentiate between WITH/WITHOUT TIMEZONE?
note that in WITH TIMEZONE, Arrow always stores the underlying value in UTC so there's no need for us to do any time zone math (thankfully!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I mistakenly assumed with TIMEZONE was a different type. For this PR planning to just raise if a timezone is detected, as I'm not yet sure how to transmit that information via the binary protocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to check for timezone here, too?
| case NANOARROW_TIME_UNIT_MICRO: | ||
| break; | ||
| case NANOARROW_TIME_UNIT_NANO: | ||
| val /= 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle truncation/overflow here
c/validation/adbc_validation_util.cc
Outdated
| case NANOARROW_TYPE_TIMESTAMP: | ||
| // TODO: don't hardcode unit here | ||
| CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type, | ||
| NANOARROW_TIME_UNIT_MICRO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, for here...you could also just write out the code to build the schema by hand? This was meant as a convenience to cut down on boilerplate for simple schemas. But it was from a very early nanoarrow version, and nowadays I'm not sure how much boilerplate it actually saves compared to just writing it out.
| param_lengths[i] = 0; | ||
| break; | ||
| case ArrowType::NANOARROW_TYPE_TIMESTAMP: | ||
| if (strcmp("", bind_schema_fields[i].timezone)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised that nanoarrow assigns an empty string to the timezone member when constructing a schema with a timestamp; was expecting it to be nullptr
| NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"})); | ||
| } | ||
|
|
||
| template <enum ArrowTimeUnit TU> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The templating here might be over-engineering. I was thinking we could also change the signature to something like:
void StatementTest::TestSqlIngestTemporalType(
std::vector<std::optional<int64_t>>& values,
enum ArrowTimeUnit unit, const char* timezone) {
...
}While nothing else is using that pattern currently, it might be nice for gtest to do something like:
ASSERT_NO_FATAL_FAILURE(TestSqlIngestTemporalType(
{std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, nullptr
));
EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType(
{std::nullopt, INT64_MIN, INT64_MAX}, NANOARROW_TIME_UNIT_SECOND, nullptr),
"overflow")
);
EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType(
{std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, "America/Los Angeles"),
"not implemented")
);(N.B. I have no experience with EXPECT_FATAL_FAILURE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easiest to have the signature be AdbcStatusCode TestSqlIngestTemporalType(..., struct AdbcError* error) and then you can use the usual ASSERT_THAT(..., IsOkStatus(&error))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to fix things here, but it should be quite straightforward to also write a matcher like IsError(&error, ::testing::HasSubstr("..."))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable as is though
c/driver/postgresql/statement.cc
Outdated
| switch (unit) { | ||
| case NANOARROW_TIME_UNIT_SECOND: | ||
| if (abs(val) > kSecOverflowLimit) { | ||
| SetError(error, "%s%" PRId64 "%s%s%s%" PRId64 "%s", "[libpq] Field #", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, why not have the format string be "[libpq] Field #" PRId64 ...?
| NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"})); | ||
| } | ||
|
|
||
| template <enum ArrowTimeUnit TU> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easiest to have the signature be AdbcStatusCode TestSqlIngestTemporalType(..., struct AdbcError* error) and then you can use the usual ASSERT_THAT(..., IsOkStatus(&error))
| NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"})); | ||
| } | ||
|
|
||
| template <enum ArrowTimeUnit TU> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to fix things here, but it should be quite straightforward to also write a matcher like IsError(&error, ::testing::HasSubstr("..."))
| NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"})); | ||
| } | ||
|
|
||
| template <enum ArrowTimeUnit TU> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable as is though
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just a couple questions
| case ArrowType::NANOARROW_TYPE_BINARY: | ||
| create += " BYTEA"; | ||
| break; | ||
| case ArrowType::NANOARROW_TYPE_TIMESTAMP: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to check for timezone here, too?
c/validation/adbc_validation_util.cc
Outdated
| case NANOARROW_TYPE_TIMESTAMP: | ||
| // TODO: don't hardcode unit here | ||
| CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type, | ||
| NANOARROW_TIME_UNIT_MICRO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change still needed?
No description provided.