Dialect-specific parsing and Snowflake JSON support #241

eyalleshem · 2020-07-27T07:46:23Z

Hi ,

Currently the lib support different databases (postgresql-sql, mysql , etc .. ) dialect only in the tokenizer.

My question is - if there was any discussions/plans about supporting per-db dialects also in the parser (or do you you think it's good idea to start such discussions )?

Where most of the DB's support ANSI sql , when it's come to more advance features like json-parsing - There is great variability between the different databases.

nickolay · 2020-07-27T07:54:31Z

Hey, I wrote about this recently in #7 (comment)

nickolay · 2020-07-27T07:56:43Z

If the question was specifically about discussing a change like this c67d667 then I was too quick to close this as a duplicate, as this is something we'll need eventually.

eyalleshem · 2020-07-27T08:57:35Z

thanks @nikolay for the fast response :) ,

Follow your answer i follow issue #7 , and the issues that mentation there .

What i currently hope to do is to parse snowflake-json queries with this this lib, and I wonder what is the correct way for doing that.

As json-Sql query in snowflake could look not native in ANSI-sql perspective -so the first idea that came into my mind is dialect.
(something like you mention in: c67d667 ).

json sql in snowflake could look like :
SELECT b.this, a.V['is_dog'] FROM a1 a, table(FLATTEN(input => a.v)) as b where a.v:is_dog::String like 'true';

I could think also about some other approaches to deal with that - but maybe it's should be in another issue , with more suitable title :)

nickolay · 2020-07-27T12:53:22Z

Right. This is a good example where different dialects conflict, as a.V['is_dog'] means a.V AS "'is_dog'" in MSSQL. It is because ['is_dog'] is a valid identifier (already configured in the dialect) and because MSSQL does not support [] for subscripts.

Something like c67d667 (i.e. the first of two options I listed in #207 (comment)) does seem to be a good solution for this, although there is a question when to use individual "toggles" like allow_omit_referenced_columns_in_column_option and if/when the parser should just branch on the dialect's identity.

I would be more comfortable with dialect-based switches, as that means N dialects to reason about instead of 2^N with the fine-grained toggles.

@Dandandan, @maxcountryman any thoughts on this?

The different ways to write subscripts do not seem to conflict syntactically at first glance:

the proprietary JSON-related syntax in Snowflake requires single-quoted strings inside brackets, i.e. foo['bar']
array subscripts in PostgreSQL take the form of arr[N] or arr[N:K] where N/K are numbers(?)

...though there are probably issues around precedence, as I can't guess the grammar PostgreSQL uses just from its docs.

maxcountryman · 2020-07-27T13:15:10Z

I agree that dialect-based switches seem preferable here. Unfortunately I can't think of a better way of handling these kinds of things.

eyalleshem · 2020-07-27T20:11:03Z

Cool ,
Thanks for the comments.

I think there is still one issue about the dialect specific values in the AST .
(I think that what @nickolay tried to solved in #189)

For example currently in derived-table definition , there is no definition of "FLATTEN" or "table ,
(I refer to the query : SELECT b.this, a.V['is_dog'] FROM a1 a, table(FLATTEN(input => a.v)) as b where a.v:is_dog::String like 'true'; )

Currently I see 3 option:

Don't represent this data on the AST (good enough for my needs and it problem should be handled later).
Add flatten and table fields (and probably other specific-db type ) in the struct of the AST.
Add an enum value that will include data type for specifics dbs.

The enum will like that :

pub enum DialectDerivedTable{
    Snowflake {
        flatten: bool, 
        table: bool 
    } 
}

and the derived table struct will look like that :

 Derived {
        lateral: bool,
        subquery: Box<Query>,
        alias: Option<TableAlias>,
        dialect_data: Option<DialectDerivedTable>, 
    },

For me option 3 look like the best solution .
WDYT ?
(or maybe you have some other ideas to handle that .. )

nickolay · 2020-07-27T20:57:50Z

TABLE( ... ) actually seems to be a standard construct called <table function derived table>/<PTF derived table> in the ANSI spec. I think it should be implemented as a new TableFactor::TableFunction(Expr) variant.

The =>thing is an (apparently also standard) way to denote named arguments in function invocations (<named argument assignment token>).

FLATTEN may fit into a generic Expr::Function or require a specific variant, like EXTRACT. In the latter case I think it's fine to have it in Expr unconditionally.

eyalleshem · 2020-07-28T05:30:56Z

thanks @nickolay for the explanation .

So I think that I am ready to start working on PR ,
will try to take in in baby-steps - for small review cycles .

My plan is to start with #223 -as it's suit for the dialect , and could also help others.
If it's will work well , I will continue other parts in the json syntax.

nickolay · 2020-07-28T09:07:55Z

Cool! Actually, #223 is the least clear part of the puzzle to me (added a comment there), but I'll be glad to play along.

eyalleshem · 2020-07-29T07:44:25Z

Ok , here my try ,
I will be happy for feedback

eyalleshem · 2020-07-29T16:15:18Z

Opened also another PR for supporting named argument functions .
(FUN(a=>exp1, b=>exp2))

tv42 · 2025-03-20T23:30:38Z

Almost 5 years without update. Can this be closed in light of https://github.com/apache/datafusion-sqlparser-rs/blob/main/tests/sqlparser_custom_dialect.rs ?

eyalleshem mentioned this issue Jul 27, 2020

Supporting dialects in the parser #240

Closed

nickolay closed this as completed Jul 27, 2020

nickolay reopened this Jul 27, 2020

nickolay mentioned this issue Jul 27, 2020

Fail to parse - "select * from (table_name) alias" (Snowflake) #223

Closed

nickolay changed the title ~~Is there plans to support database's dialect also in the parser~~ Dialect-specific parsing and Snowflake JSON support Jul 27, 2020

nickolay mentioned this issue Jul 31, 2020

Use dialects in the parser for support snowflake aliasing syntax #244

Closed

nickolay mentioned this issue Dec 12, 2021

[Discussion] Remove use of dialect_of macro #339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dialect-specific parsing and Snowflake JSON support #241

Dialect-specific parsing and Snowflake JSON support #241

eyalleshem commented Jul 27, 2020

nickolay commented Jul 27, 2020

nickolay commented Jul 27, 2020

eyalleshem commented Jul 27, 2020 •

edited by nickolay

Loading

nickolay commented Jul 27, 2020 •

edited

Loading

maxcountryman commented Jul 27, 2020

eyalleshem commented Jul 27, 2020 •

edited

Loading

nickolay commented Jul 27, 2020

eyalleshem commented Jul 28, 2020

nickolay commented Jul 28, 2020

eyalleshem commented Jul 29, 2020 •

edited

Loading

eyalleshem commented Jul 29, 2020 •

edited

Loading

tv42 commented Mar 20, 2025

Dialect-specific parsing and Snowflake JSON support #241

Dialect-specific parsing and Snowflake JSON support #241

Comments

eyalleshem commented Jul 27, 2020

nickolay commented Jul 27, 2020

nickolay commented Jul 27, 2020

eyalleshem commented Jul 27, 2020 • edited by nickolay Loading

nickolay commented Jul 27, 2020 • edited Loading

maxcountryman commented Jul 27, 2020

eyalleshem commented Jul 27, 2020 • edited Loading

nickolay commented Jul 27, 2020

eyalleshem commented Jul 28, 2020

nickolay commented Jul 28, 2020

eyalleshem commented Jul 29, 2020 • edited Loading

eyalleshem commented Jul 29, 2020 • edited Loading

tv42 commented Mar 20, 2025

eyalleshem commented Jul 27, 2020 •

edited by nickolay

Loading

nickolay commented Jul 27, 2020 •

edited

Loading

eyalleshem commented Jul 27, 2020 •

edited

Loading

eyalleshem commented Jul 29, 2020 •

edited

Loading

eyalleshem commented Jul 29, 2020 •

edited

Loading