-
Notifications
You must be signed in to change notification settings - Fork 605
Parse column constraints in any order #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 313
💛 - Coveralls |
c42250d
to
bf7d50b
Compare
This got rather long, sorry!
A syntax error happens whenever a "token" (or - with the change from #82 - a token pair, like "NOT IN") following the
I've looked through the docs for a few dialects (below) and I can't think of anything other than After #82 I don't see why What do you think?
As we parse COLLATE as an optional suffix at the end of
ANSI allows All three dialects agree that Maybe this could be modeled as a
|
This is brilliant! You're entirely right that there's a bug here. We should absolutely be returning 0 if we see a unary NOT in get_next_precedence, because unary NOT is not an infix operator and therefore has no left binding power. I'm going to split that change out into its own PR. |
Ok, I've created a separate PR for the unary NOT issue (#107), and rebased this PR on top of it.
Oops, thanks! This is fixed now, and I've added a test.
This was spot on, and is addressed in #107.
I took a shortcut here and added |
Did you mean you wanted to merge this PR before reworking? I’ll try look at this over the weekend, but please let me know if it blocks you. |
Not blocked! Just wanted to write down which chunks of your feedback I've addressed and which chunks I haven't, so that I remember what's left when I get back to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! I think future improvements can be done when/if someone needs them.
src/sqlast/ddl.rs
Outdated
pub enum ColumnConstraint { | ||
/// `NULL` | ||
/// | ||
/// The ANSI specification technically allows NULL constraints to have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nit-picking: I don't believe ANSI SQL allows CONSTRAINT foo NULL - it's Postgres that seems to allow it judging from its docs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh man, good point. What a nightmare. Fixed.
src/sqlast/ddl.rs
Outdated
} | ||
|
||
#[derive(Debug, Clone, PartialEq, Hash)] | ||
pub enum ColumnConstraint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current AST seems to be partially modeled after Postgres in which any column option (except COLLATE) is called a "column constraint", even though it doesn't constrain anything. Shall we add "GENERATED" to this struct when it's implemented? Other options, like MS-specific ones? Would we keep the ColumnConstraint
name then? If we plan to rename, it might be better to rename it before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I've pretty substantially reworked the design of the AST. I've renamed the main enum to ColumnOption
, to better reflect the fact that it might not contain only constraints. I've also moved the optional name to a wrapping ColumnOptionDef
struct, which better allows any of the options to take on a name, to allow for Postgres syntax, where practically everything can have a name.
I think this should provide the right amount of future flexibility. GENERATED naturally fits as a new enum variant in ColumnOption
, and hopefully it's less grating now that the struct doesn't refer to "constraints." MSSQL-specific options can also go into ColumnOption
, if that makes sense, or they can live as fields on SQLColumnDef
, if they're the kind of options (like SPARSE) that can't be named and must appear in a fixed order.
55988c3
to
2943e05
Compare
CREATE TABLE t (a INT NOT NULL DEFAULT 1 PRIMARY KEY) is as valid as CREATE TABLE t (a INT DEFAULT 1 PRIMARY KEY NOT NULL).
Looks great, thanks! |
Since PR apache#93 `parse_column_def` parses a set of column options in a loop, e.g. given: ``` _______ column_def _______ CREATE TABLE foo (bar INT NOT NULL DEFAULT 1, ) -------- --------- option 1 option 2 ```` it parses column options until it encounters one of the delimiter tokens First when we only supported `CREATE TABLE`, the set of delimiters that stopped the parsing used to be `Token::Comma | Token::RParen`. Then we added support for `ALTER TABLE ADD COLUMN <column_def>`. Turns out the parser started to bail if the statement ended with a semicolon, while attempting to parse the semicolon as a column option, as we forgot to add it to the set of delimiter tokens. This was recently fixed in apache#246 by including Token::SemiColon to the list, but it felt wrong to have to update this list, and to have a common list of delimiters for two different contexts (CREATE TABLE with parens vs ALTER TABLE ADD COLUMN without parens). Also our current approach cannot handle multiple statements NOT separated by a semicolon, as is common in MS SQL DDL. We don't explicitly support it in `parse_statements`, but that's a use-case like to keep in mind nevertheless.
Since PR apache#93 `parse_column_def` parses a set of column options in a loop, e.g. given: ``` _______ column_def _______ CREATE TABLE foo (bar INT NOT NULL DEFAULT 1, ) -------- --------- option 1 option 2 ```` it parses column options until it encounters one of the delimiter tokens First when we only supported `CREATE TABLE`, the set of delimiters that stopped the parsing used to be `Token::Comma | Token::RParen`. Then we added support for `ALTER TABLE ADD COLUMN <column_def>`. Turns out the parser started to bail if the statement ended with a semicolon, while attempting to parse the semicolon as a column option, as we forgot to add it to the set of delimiter tokens. This was recently fixed in apache#246 by including Token::SemiColon to the list, but it felt wrong to have to update this list, and to have a common list of delimiters for two different contexts (CREATE TABLE with parens vs ALTER TABLE ADD COLUMN without parens). Also our current approach cannot handle multiple statements NOT separated by a semicolon, as is common in MS SQL DDL. We don't explicitly support it in `parse_statements`, but that's a use-case like to keep in mind nevertheless.
Since PR apache#93 `parse_column_def` parses a set of column options in a loop, e.g. given: ``` _______ column_def _______ CREATE TABLE foo (bar INT NOT NULL DEFAULT 1, ) -------- --------- option 1 option 2 ```` it parses column options until it encounters one of the delimiter tokens First when we only supported `CREATE TABLE`, the set of delimiters that stopped the parsing used to be `Token::Comma | Token::RParen`. Then we added support for `ALTER TABLE ADD COLUMN <column_def>`. Turns out the parser started to bail if the statement ended with a semicolon, while attempting to parse the semicolon as a column option, as we forgot to add it to the set of delimiter tokens. This was recently fixed in apache#246 by including Token::SemiColon to the list, but it felt wrong to have to update this list, and to have a common list of delimiters for two different contexts (CREATE TABLE with parens vs ALTER TABLE ADD COLUMN without parens). Also our current approach cannot handle multiple statements NOT separated by a semicolon, as is common in MS SQL DDL. We don't explicitly support it in `parse_statements`, but that's a use-case like to keep in mind nevertheless.
CREATE TABLE t (a INT NOT NULL DEFAULT 1 PRIMARY KEY)
is as valid asCREATE TABLE t (a INT DEFAULT 1 PRIMARY KEY NOT NULL)
.@nickolay I got annoyed by the tests in sqlparser_postgres and adjusted them so that the first test asserts every column specification, and the remainder simply verify that the SQL roundtrips. Let me know if you'd prefer to be more precise with the later tests.