Parse column constraints in any order #93

benesch · 2019-06-03T02:57:06Z

CREATE TABLE t (a INT NOT NULL DEFAULT 1 PRIMARY KEY) is as valid as CREATE TABLE t (a INT DEFAULT 1 PRIMARY KEY NOT NULL).

@nickolay I got annoyed by the tests in sqlparser_postgres and adjusted them so that the first test asserts every column specification, and the remainder simply verify that the SQL roundtrips. Let me know if you'd prefer to be more precise with the later tests.

coveralls · 2019-06-03T03:10:52Z

Pull Request Test Coverage Report for Build 313

246 of 272 (90.44%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at ?%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/sqlast/ddl.rs	45	46	97.83%
src/sqlparser.rs	41	42	97.62%
tests/sqlparser_common.rs	73	84	86.9%
tests/sqlparser_postgres.rs	87	100	87.0%

Totals
Change from base Build 307:	0.0%
Covered Lines:
Relevant Lines:	0

💛 - Coveralls

nickolay · 2019-06-05T01:17:39Z

This got rather long, sorry!

The CHECK constraint is only partially implemented: it's in the AST, but the parser doesn't handle it. Please make a note if you want to leave it unimplemented for now.
A thought on parsing of the DEFAULT expression:

We want to allow as many DEFAULT expressions as possible, but calling self.parse_expr() will choke on expressions like "DEFAULT 1 NOT NULL". Typically this would be a syntax error, as "1 NOT NULL" is not a valid SQL expression, but in this case we know that NOT NULL is part of the next column constraint.

A syntax error happens whenever a "token" (or - with the change from #82 - a token pair, like "NOT IN") following the DEFAULT expression has a get_next_precedence > 0. The Pratt parser then attempts to parse it as an "infix", and in the case of NOT NULL fails with "Expected IN or BETWEEN after NOT, found: null".

As it turns out, the only expressions that cause trouble all have precedence less than or equal to BETWEEN, so we pass BETWEEN_PREC to stop parsing on tokens with precedence less than or equal to BETWEEN.
The same trick is employed by PostgreSQL

I've looked through the docs for a few dialects (below) and I can't think of anything other than NOT (and COLLATE - see below) that's under a risk of being incorrectly treated as a DEFAULT expression's infix, while being a valid keyword starting a new constraint or a column attribute definition.

After #82 get_next_precedence is only used to get the left-binding precedence of the peeked token (unlike before when get_precedence(make_keyword("NOT")) was used to get the right-binding precedence to pass to parse_subexpr - we now use constants for that).

I don't see why get_next_precedence needs to treat NOT not followed by IN/BETWEEN any differently than other keywords that are not "infix" operators. It currently returns lbp=UNARY_NOT_PREC for NOT in infix position instead of 0, so you have to call parse_subexpr with rbp=BETWEEN_PREC to make it stop (per rbp>=lbp) when encountering NOT. If we defined lbp(NOT) = 0, the parser invoked with the default rbp=0 would stop on NOT followed by NULL just like it stops on any unknown keyword. (And the unreachable! removed in #71 could be restored then.)

What do you think?

On COLLATE:

ANSI puts COLLATE as the last clause in a column definition, but it doesn't allow arbitrary expressions in DEFAULT, so it doesn't conflict.
PG/MS both seem to expect COLLATE before DEFAULT in column definition, presumably to avoid the ambiguity of DEFAULT 'foo' COLLATE bar.

As we parse COLLATE as an optional suffix at the end of parse_prefix, the precedence trick doesn't help with this anyway, and to support ANSI syntax we'd need to use a different radically simpler parser for the DEFAULT clause. This is not in the scope of this PR, of course.

On the AST design:

ANSI allows CONSTRAINT <name> to be used with NOT NULL, UNIQUE/PRIMARY KEY, and REFERENCES. MSSQL doesn't seem to support it for NOT NULL, but supports it for DEFAULT instead. Postgres allows both NOT NULL and DEFAULT, plus NULL and GENERATED as constraints.

All three dialects agree that COLLATE isn't a constraint, but expect it in different positions. MS has a number of custom options interleaved with the standard ones.

Maybe this could be modeled as a Vec<ColumnOption> where a ColumnOption is either a Constraint { name: Option<String>, constraint: Box<ColumnOption>, deferred, deferrable, etc }, one of standard options (including constraints, COLLATE, DEFAULT, and GENERATED), or a dialect-specific String?

I didn't get to the tests, but given that you're being very thorough with testing, I'll probably agree with whatever approach you decided on.
Relevant parts of the docs:

ANSI

<column definition> ::= <column name> [ <data type or domain name> ]
    [ <default clause> | <identity column specification> | <generation clause>
      | <system time period start column specification>
      | <system time period end column specification>
    ]
    [ <column constraint definition>... ]
    [ <collate clause> ]

<column constraint definition> ::= [CONSTRAINT ..]
  { NOT NULL
    | <unique specification>
    | <references specification>
    | <check constraint definition>
  }
  { in any order:
    [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
    [ [ NOT ] DEFERRABLE ]
  }
  [ [ NOT ] ENFORCED ]

Postgres

column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]
and column_constraint is:
[ CONSTRAINT constraint_name ]
{ NOT NULL | NULL | CHECK | DEFAULT .. | GENERATED .. |
UNIQUE .. | PRIMARY KEY .. | REFERENCES .. }
[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

MSSQL

<column_definition> ::= column_name <data_type>
  [ FILESTREAM ]
  [ COLLATE collation_name ]
  [ custom stuff ]
  [ CONSTRAINT constraint_name [ DEFAULT constant_expression ] ]
  [ GENERATED .. ]
  [ NULL | NOT NULL ]
  [ more custom stuff ]
  [ <column_constraint> [, ...n ] ]
  [ <column_index> ]

<column_constraint> ::= [ CONSTRAINT constraint_name ]
  { CHECK .. 
  | FOREIGN KEY .. 
  | { PRIMARY KEY | UNIQUE } [ custom ] }

benesch · 2019-06-05T18:41:44Z

After #82 get_next_precedence is only used to get the left-binding precedence of the peeked token (unlike before when get_precedence(make_keyword("NOT")) was used to get the right-binding precedence to pass to parse_subexpr - we now use constants for that).

I don't see why get_next_precedence needs to treat NOT not followed by IN/BETWEEN any differently than other keywords that are not "infix" operators. It currently returns lbp=UNARY_NOT_PREC for NOT in infix position instead of 0, so you have to call parse_subexpr with rbp=BETWEEN_PREC to make it stop (per rbp>=lbp) when encountering NOT. If we defined lbp(NOT) = 0, the parser invoked with the default rbp=0 would stop on NOT followed by NULL just like it stops on any unknown keyword. (And the unreachable! removed in #71 could be restored then.)

This is brilliant! You're entirely right that there's a bug here. We should absolutely be returning 0 if we see a unary NOT in get_next_precedence, because unary NOT is not an infix operator and therefore has no left binding power.

I'm going to split that change out into its own PR.

benesch · 2019-06-06T17:44:33Z

Ok, I've created a separate PR for the unary NOT issue (#107), and rebased this PR on top of it.

The CHECK constraint is only partially implemented: it's in the AST, but the parser doesn't handle it. Please make a note if you want to leave it unimplemented for now.

Oops, thanks! This is fixed now, and I've added a test.

A thought on parsing of the DEFAULT expression...

This was spot on, and is addressed in #107.

On COLLATE...

On the AST design...

I took a shortcut here and added COLLATE as an Option<SQLObjectName> on the SQLColumnDef struct. I'll try to find time soon to rework the structs as you've suggested.

nickolay · 2019-06-06T19:49:44Z

I'll try to find time soon to rework the structs as you've suggested.

Did you mean you wanted to merge this PR before reworking? I’ll try look at this over the weekend, but please let me know if it blocks you.

benesch · 2019-06-06T19:54:30Z

Did you mean you wanted to merge this PR before reworking? I’ll try look at this over the weekend, but please let me know if it blocks you.

Not blocked! Just wanted to write down which chunks of your feedback I've addressed and which chunks I haven't, so that I remember what's left when I get back to it.

nickolay

LGTM, thanks! I think future improvements can be done when/if someone needs them.

nickolay · 2019-06-07T22:52:57Z

src/sqlast/ddl.rs

+pub enum ColumnConstraint {
+    /// `NULL`
+    ///
+    /// The ANSI specification technically allows NULL constraints to have a


Sorry for nit-picking: I don't believe ANSI SQL allows CONSTRAINT foo NULL - it's Postgres that seems to allow it judging from its docs...

Oh man, good point. What a nightmare. Fixed.

nickolay · 2019-06-08T00:01:59Z

src/sqlast/ddl.rs

+}
+
+#[derive(Debug, Clone, PartialEq, Hash)]
+pub enum ColumnConstraint {


The current AST seems to be partially modeled after Postgres in which any column option (except COLLATE) is called a "column constraint", even though it doesn't constrain anything. Shall we add "GENERATED" to this struct when it's implemented? Other options, like MS-specific ones? Would we keep the ColumnConstraint name then? If we plan to rename, it might be better to rename it before landing.

Good points. I've pretty substantially reworked the design of the AST. I've renamed the main enum to ColumnOption, to better reflect the fact that it might not contain only constraints. I've also moved the optional name to a wrapping ColumnOptionDef struct, which better allows any of the options to take on a name, to allow for Postgres syntax, where practically everything can have a name.

I think this should provide the right amount of future flexibility. GENERATED naturally fits as a new enum variant in ColumnOption, and hopefully it's less grating now that the struct doesn't refer to "constraints." MSSQL-specific options can also go into ColumnOption, if that makes sense, or they can live as fields on SQLColumnDef, if they're the kind of options (like SPARSE) that can't be named and must appear in a fixed order.

CREATE TABLE t (a INT NOT NULL DEFAULT 1 PRIMARY KEY) is as valid as CREATE TABLE t (a INT DEFAULT 1 PRIMARY KEY NOT NULL).

nickolay · 2019-06-08T17:30:08Z

Looks great, thanks!

Since PR apache#93 `parse_column_def` parses a set of column options in a loop, e.g. given: ``` _______ column_def _______ CREATE TABLE foo (bar INT NOT NULL DEFAULT 1, ) -------- --------- option 1 option 2 ```` it parses column options until it encounters one of the delimiter tokens First when we only supported `CREATE TABLE`, the set of delimiters that stopped the parsing used to be `Token::Comma | Token::RParen`. Then we added support for `ALTER TABLE ADD COLUMN <column_def>`. Turns out the parser started to bail if the statement ended with a semicolon, while attempting to parse the semicolon as a column option, as we forgot to add it to the set of delimiter tokens. This was recently fixed in apache#246 by including Token::SemiColon to the list, but it felt wrong to have to update this list, and to have a common list of delimiters for two different contexts (CREATE TABLE with parens vs ALTER TABLE ADD COLUMN without parens). Also our current approach cannot handle multiple statements NOT separated by a semicolon, as is common in MS SQL DDL. We don't explicitly support it in `parse_statements`, but that's a use-case like to keep in mind nevertheless.

benesch requested a review from nickolay June 3, 2019 02:57

benesch force-pushed the col-constr-order branch 2 times, most recently from c42250d to bf7d50b Compare June 4, 2019 14:53

benesch mentioned this pull request Jun 6, 2019

Fix precedence of unary NOT #107

Merged

benesch force-pushed the col-constr-order branch from bf7d50b to 58dabda Compare June 6, 2019 17:41

benesch force-pushed the col-constr-order branch from 58dabda to 1af39db Compare June 6, 2019 17:45

benesch force-pushed the col-constr-order branch from 1af39db to b47438d Compare June 6, 2019 21:33

nickolay approved these changes Jun 7, 2019

View reviewed changes

nickolay reviewed Jun 8, 2019

View reviewed changes

benesch force-pushed the col-constr-order branch 2 times, most recently from 55988c3 to 2943e05 Compare June 8, 2019 16:11

Parse column constraints in any order

ffa1c8f

CREATE TABLE t (a INT NOT NULL DEFAULT 1 PRIMARY KEY) is as valid as CREATE TABLE t (a INT DEFAULT 1 PRIMARY KEY NOT NULL).

benesch force-pushed the col-constr-order branch from 2943e05 to ffa1c8f Compare June 8, 2019 16:14

benesch merged commit 7e96d81 into apache:master Jun 9, 2019

benesch deleted the col-constr-order branch June 21, 2019 22:34

nickolay mentioned this pull request Jul 31, 2020

Refactor <column definition> parsing #251

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse column constraints in any order #93

Parse column constraints in any order #93

benesch commented Jun 3, 2019

coveralls commented Jun 3, 2019 •

edited

Loading

nickolay commented Jun 5, 2019

benesch commented Jun 5, 2019

benesch commented Jun 6, 2019

nickolay commented Jun 6, 2019

benesch commented Jun 6, 2019 •

edited

Loading

nickolay left a comment

nickolay Jun 7, 2019

benesch Jun 8, 2019

nickolay Jun 8, 2019

benesch Jun 8, 2019

nickolay commented Jun 8, 2019

Parse column constraints in any order #93

Parse column constraints in any order #93

Conversation

benesch commented Jun 3, 2019

coveralls commented Jun 3, 2019 • edited Loading

Pull Request Test Coverage Report for Build 313

💛 - Coveralls

nickolay commented Jun 5, 2019

benesch commented Jun 5, 2019

benesch commented Jun 6, 2019

nickolay commented Jun 6, 2019

benesch commented Jun 6, 2019 • edited Loading

nickolay left a comment

Choose a reason for hiding this comment

nickolay Jun 7, 2019

Choose a reason for hiding this comment

benesch Jun 8, 2019

Choose a reason for hiding this comment

nickolay Jun 8, 2019

Choose a reason for hiding this comment

benesch Jun 8, 2019

Choose a reason for hiding this comment

nickolay commented Jun 8, 2019

coveralls commented Jun 3, 2019 •

edited

Loading

benesch commented Jun 6, 2019 •

edited

Loading