feat(c/driver/postgresql): Timestamp write support #861

WillAyd · 2023-06-28T17:13:25Z

No description provided.

WillAyd · 2023-06-28T17:14:39Z

c/validation/adbc_validation_util.cc

+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,


SchemaFields are an invention of the adbc_validation_util header - is there a reason we use this custom struct instead of an ArrowSchemaView? I'm not sure if its worth adding time_unit to the former or porting over to the latter in the test suite

I think I originally just wanted something that could be easily stack allocated, versus a nested heap allocated structure. I was also struggling with how to deal with validation of more complex types. Especially in the validation code where you may not always want full exact matches (maybe you don't care about the field name, etc.)

I think this is OK for now but I think what I might end up doing is writing up a set of Googletest matchers so we can write out code like

ASSERT_THAT(view->children[0], IsTimestamp(/*any unit*/)); ASSERT_THAT(view->children[0], IsTimestamp(NANOARROW_TIME_UNIT_MICRO));

or something like that

Ah, for here...you could also just write out the code to build the schema by hand? This was meant as a convenience to cut down on boilerplate for simple schemas. But it was from a very early nanoarrow version, and nowadays I'm not sure how much boilerplate it actually saves compared to just writing it out.

That makes sense for matching, though I think we still have a gap with that when creating test data in non-default precisions that needs to be solved

Ah OK cool - will try that

Yes, sorry, I realized I had gotten the context completely wrong once I clicked through and started reviewing

Is this change still needed?

lidavidm · 2023-06-28T17:46:14Z

c/driver/postgresql/statement.cc

      case ArrowType::NANOARROW_TYPE_BINARY:
        create += " BYTEA";
        break;
+      case ArrowType::NANOARROW_TYPE_TIMESTAMP:


We should probably differentiate between WITH/WITHOUT TIMEZONE?

note that in WITH TIMEZONE, Arrow always stores the underlying value in UTC so there's no need for us to do any time zone math (thankfully!)

Makes sense. I mistakenly assumed with TIMEZONE was a different type. For this PR planning to just raise if a timezone is detected, as I'm not yet sure how to transmit that information via the binary protocol

Don't we need to check for timezone here, too?

lidavidm · 2023-06-28T17:46:28Z

c/driver/postgresql/statement.cc

+                case NANOARROW_TIME_UNIT_MICRO:
+                  break;
+                case NANOARROW_TIME_UNIT_NANO:
+                  val /= 1000;


We should handle truncation/overflow here

c/driver/postgresql/statement.cc

lidavidm · 2023-06-28T17:48:39Z

c/validation/adbc_validation_util.cc

+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,


Ah, for here...you could also just write out the code to build the schema by hand? This was meant as a convenience to cut down on boilerplate for simple schemas. But it was from a very early nanoarrow version, and nowadays I'm not sure how much boilerplate it actually saves compared to just writing it out.

WillAyd · 2023-06-28T21:53:42Z

c/driver/postgresql/statement.cc

          param_lengths[i] = 0;
          break;
+        case ArrowType::NANOARROW_TYPE_TIMESTAMP:
+          if (strcmp("", bind_schema_fields[i].timezone)) {


I was a bit surprised that nanoarrow assigns an empty string to the timezone member when constructing a schema with a timestamp; was expecting it to be nullptr

WillAyd · 2023-06-28T22:00:32Z

c/validation/adbc_validation.cc

      NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }

+template <enum ArrowTimeUnit TU>


The templating here might be over-engineering. I was thinking we could also change the signature to something like:

void StatementTest::TestSqlIngestTemporalType( std::vector<std::optional<int64_t>>& values, enum ArrowTimeUnit unit, const char* timezone) { ... }

While nothing else is using that pattern currently, it might be nice for gtest to do something like:

ASSERT_NO_FATAL_FAILURE(TestSqlIngestTemporalType( {std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, nullptr )); EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType( {std::nullopt, INT64_MIN, INT64_MAX}, NANOARROW_TIME_UNIT_SECOND, nullptr), "overflow") ); EXPECT_FATAL_FAILURE(TestSqlIngestTemporalType( {std::nullopt, 0, 42}, NANOARROW_TIME_UNIT_SECOND, "America/Los Angeles"), "not implemented") );

(N.B. I have no experience with EXPECT_FATAL_FAILURE)

It might be easiest to have the signature be AdbcStatusCode TestSqlIngestTemporalType(..., struct AdbcError* error) and then you can use the usual ASSERT_THAT(..., IsOkStatus(&error))

I need to fix things here, but it should be quite straightforward to also write a matcher like IsError(&error, ::testing::HasSubstr("..."))

This seems reasonable as is though

lidavidm · 2023-06-29T14:56:29Z

c/driver/postgresql/statement.cc

+              switch (unit) {
+                case NANOARROW_TIME_UNIT_SECOND:
+                  if (abs(val) > kSecOverflowLimit) {
+                    SetError(error, "%s%" PRId64 "%s%s%s%" PRId64 "%s", "[libpq] Field #",


FWIW, why not have the format string be "[libpq] Field #" PRId64 ...?

lidavidm · 2023-06-29T14:58:42Z

c/validation/adbc_validation.cc

      NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }

+template <enum ArrowTimeUnit TU>


It might be easiest to have the signature be AdbcStatusCode TestSqlIngestTemporalType(..., struct AdbcError* error) and then you can use the usual ASSERT_THAT(..., IsOkStatus(&error))

lidavidm · 2023-06-29T14:59:08Z

c/validation/adbc_validation.cc

      NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }

+template <enum ArrowTimeUnit TU>


I need to fix things here, but it should be quite straightforward to also write a matcher like IsError(&error, ::testing::HasSubstr("..."))

lidavidm · 2023-06-29T15:01:21Z

c/validation/adbc_validation.cc

      NANOARROW_TYPE_BINARY, {std::nullopt, "", "\x00\x01\x02\x04", "\xFE\xFF"}));
 }

+template <enum ArrowTimeUnit TU>


This seems reasonable as is though

lidavidm

Thanks, just a couple questions

lidavidm · 2023-06-30T12:22:20Z

c/driver/postgresql/statement.cc

      case ArrowType::NANOARROW_TYPE_BINARY:
        create += " BYTEA";
        break;
+      case ArrowType::NANOARROW_TYPE_TIMESTAMP:


Don't we need to check for timezone here, too?

lidavidm · 2023-06-30T12:22:57Z

c/validation/adbc_validation_util.cc

+      case NANOARROW_TYPE_TIMESTAMP:
+        // TODO: don't hardcode unit here
+        CHECK_ERRNO(ArrowSchemaSetTypeDateTime(schema->children[i], field.type,
+                                               NANOARROW_TIME_UNIT_MICRO,


Is this change still needed?

WillAyd added 2 commits June 28, 2023 10:06

initial timestamp hacks

8aa5d49

sqlite skip

e630986

WillAyd commented Jun 28, 2023

View reviewed changes

lidavidm reviewed Jun 28, 2023

View reviewed changes

WillAyd added 4 commits June 28, 2023 13:22

feedback pt 1

289606c

implemented custom test for timestamp

78b817d

variable unit support

776119d

templated time unit test

80b21e5

WillAyd commented Jun 28, 2023

View reviewed changes

Fixed error printing

bcd344b

WillAyd mentioned this pull request Jun 28, 2023

Add SQL Support for ADBC Drivers pandas-dev/pandas#53869

Merged

driver manager test fix

bfbf242

lidavidm reviewed Jun 29, 2023

View reviewed changes

simplify SetError

c8f7a6b

WillAyd marked this pull request as ready for review June 30, 2023 01:37

lidavidm approved these changes Jun 30, 2023

View reviewed changes

feedback

60ef31d

lidavidm approved these changes Jun 30, 2023

View reviewed changes

lidavidm merged commit fd24082 into apache:main Jun 30, 2023

lidavidm added this to the ADBC Libraries 0.6.0 milestone Jun 30, 2023

WillAyd deleted the pg-ts-write branch June 30, 2023 16:38

feat(c/driver/postgresql): Timestamp write support #861

feat(c/driver/postgresql): Timestamp write support #861

Uh oh!

Conversation

WillAyd commented Jun 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants