Skip to content

support general typed string literals #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 11, 2020

Conversation

maxcountryman
Copy link
Contributor

This is a direct port of MaterializeInc/materialize#3146. It provides
support for e.g. Postgres syntax where any string literal may be
preceded by a type name.

Fixes #168.

@maxcountryman maxcountryman force-pushed the feature/port-materialize-#3146 branch from 8487d2e to 731f400 Compare June 6, 2020 16:55
Copy link
Contributor

@nickolay nickolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! The comments below are mainly nits, as usual, but I do think it's important to switch back to using a TypedString (a la MaterializeInc/materialize@d6b24fb) instead of converting DATE 'yyyy-mm-dd to a Cast, and to fix the NOT 'a' like 'b' bug (possibly by disallowing the custom data types as part of a "typed string" - MaterializeInc's version of the parser seems to have disabled this feature at all).

@nickolay
Copy link
Contributor

nickolay commented Jun 7, 2020

I wanted to note that even though that Materialize as a whole is licensed under a non-opensource license, they kept the sql-parser bits we borrow these changes from (value.rs and parser.rs) available under the Apache license, so there shouldn't be a problem with this.

@maxcountryman
Copy link
Contributor Author

Thank you for the feedback! I'll try to have another look at this later on this week.

@@ -34,6 +34,15 @@ macro_rules! parser_err {
};
}

// Returns a successful result if the optional expression is some
macro_rules! return_ok_if_some {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think about this name: it seemed a little closer to what's happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought maybe_parse could return a Result (return_if_ok!(self.maybe_parse(|parser| {), but this is probably fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure, happy to do it that way if you prefer!

@@ -34,6 +34,15 @@ macro_rules! parser_err {
};
}

// Returns a successful result if the optional expression is some
macro_rules! return_ok_if_some {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought maybe_parse could return a Result (return_if_ok!(self.maybe_parse(|parser| {), but this is probably fine too.

src/parser.rs Outdated
Comment on lines 206 to 207
// Single-quoted strings are parsed as custom data types, however this not desirable
// when we are handling input like `"NOT 'a' NOT LIKE 'b'"` because this will produce a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Any name (Word in our parlance), not a 'string', will be accepted as a custom data type, even when it's a keyword that should be handled by the code below.

src/parser.rs Outdated
DataType::Interval => parser.parse_literal_interval(),
// Single-quoted strings are parsed as custom data types, however this not desirable
// when we are handling input like `"NOT 'a' NOT LIKE 'b'"` because this will produce a
// TypedString instead of a SingleQuotedString. Further, this leads to issues where the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a TypedString[NOT 'a'] instead of Unary NOT?

src/parser.rs Outdated
// Single-quoted strings are parsed as custom data types, however this not desirable
// when we are handling input like `"NOT 'a' NOT LIKE 'b'"` because this will produce a
// TypedString instead of a SingleQuotedString. Further, this leads to issues where the
// same input will yield a BinaryOperator instead of the correct UnaryOperator. Here we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. If it's important, an example to demonstrate it would be welcome.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this, it's just explaining why our test would fail if we didn't explicitly handle DataType::Custom by returning an error (e.g. it will parse as a BinaryOperator instead of an UnaryOperator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to change this entire comment to say something to the effect of we don't accept the type 'string' syntax for the custom data types? (Borrowing from your comment below.)

Copy link
Contributor

@nickolay nickolay Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will parse as a BinaryOperator instead of an UnaryOperator

I don't understand what will parse as BinaryOperator.

Would it be better to change this entire comment to say something to the effect of we don't accept the type 'string' syntax for the custom data types? (Borrowing from your comment below.)

I think both the description of what returning an Err() here does and the non-obvious example of NOT 'x' LIKE 'y' should stay in the comment. I'm just trying to make sure I understand the examples and that they are clearly described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what will parse as BinaryOperator.

Here I mean the direct example given in the comment of "NOT 'a' NOT LIKE 'b'".

I think both the description of what returning an Err() here does and the non-obvious example of NOT 'x' LIKE 'y' should stay in the comment. I'm just trying to make sure I understand the examples and that they are clearly described.

Sure, makes sense. I may be approaching the limits of my (admittedly shallow) understanding of the library here so I may need to defer to you for how you'd like to document that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mean the direct example given in the comment of "NOT 'a' NOT LIKE 'b'".

Ah, sorry for being dense.

src/parser.rs Outdated
Comment on lines 206 to 210
// Single-quoted strings are parsed as custom data types, however this not desirable
// when we are handling input like `"NOT 'a' NOT LIKE 'b'"` because this will produce a
// TypedString instead of a SingleQuotedString. Further, this leads to issues where the
// same input will yield a BinaryOperator instead of the correct UnaryOperator. Here we
// handle that specific case by returning an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...we don't accept the type 'string' syntax for the custom data types at all."

src/parser.rs Outdated
Comment on lines 1978 to 1955
// Check if the recently consumed '(' started a derived table, in which case we've
// parsed the subquery, followed by the closing ')', and the alias of the derived
// table. In the example above this is case (3).
//
// A parsing error from `parse_derived_table_factor` indicates that the '(' we've
// recently consumed does not start a derived table (cases 1, 2, or 4). Ignore the
// error and back up to where we after the opening '('.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments' text does not match their new location in the code. I can update the wording if you like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that would be helpful if you don't mind. 🙂

@maxcountryman maxcountryman force-pushed the feature/port-materialize-#3146 branch from 4b4e73d to 2fa9479 Compare June 11, 2020 13:26
This is a direct port of MaterializeInc/materialize#3146. It provides
support for e.g. Postgres syntax where any string literal may be
preceded by a type name.

Fixes apache#168.
@maxcountryman maxcountryman force-pushed the feature/port-materialize-#3146 branch from 2fa9479 to 51cb5c7 Compare June 11, 2020 13:29
@coveralls
Copy link

coveralls commented Jun 11, 2020

Pull Request Test Coverage Report for Build 132531077

  • 41 of 41 (100.0%) changed or added relevant lines in 3 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.3%) to 91.708%

Files with Coverage Reduction New Missed Lines %
src/ast/value.rs 1 87.04%
src/parser.rs 1 89.27%
Totals Coverage Status
Change from base Build 132518608: 0.3%
Covered Lines: 4070
Relevant Lines: 4438

💛 - Coveralls

@maxcountryman
Copy link
Contributor Author

I rebased this off of master (and squashed to avoid having to resolve the merge conflicts in each commit) so this should now incorporate the mainline changes.

@nickolay
Copy link
Contributor

I rebased this off of master (and squashed to avoid having to resolve the merge conflicts in each commit) so this should now incorporate the mainline changes.

Excellent.

I pushed an update to the comments, let me know if they make sense, and I think this is ready to merge?

Comment on lines +203 to +209
// PosgreSQL allows almost any identifier to be used as custom data type name,
// and we support that in `parse_data_type()`. But unlike Postgres we don't
// have a list of globally reserved keywords (since they vary across dialects),
// so given `NOT 'a' LIKE 'b'`, we'd accept `NOT` as a possible custom data type
// name, resulting in `NOT 'a'` being recognized as a `TypedString` instead of
// an unary negation `NOT ('a' LIKE 'b')`. To solve this, we don't accept the
// `type 'string'` syntax for the custom data types at all.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads so much better. Thank you!

@maxcountryman
Copy link
Contributor Author

I pushed an update to the comments, let me know if they make sense, and I think this is ready to merge?

Amazing. Thank you! It looks good to me. Really appreciate all the detailed feedback and of course patience. 🙂

@nickolay nickolay merged commit 6cdd4a1 into apache:master Jun 11, 2020
@nickolay
Copy link
Contributor

Thank you as well!

@maxcountryman maxcountryman deleted the feature/port-materialize-#3146 branch June 11, 2020 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

column names that conflict with DATE/TIME keywords must be quoted
3 participants