Skip to content

Dialect-specific parsing and Snowflake JSON support #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eyalleshem opened this issue Jul 27, 2020 · 12 comments
Open

Dialect-specific parsing and Snowflake JSON support #241

eyalleshem opened this issue Jul 27, 2020 · 12 comments

Comments

@eyalleshem
Copy link
Contributor

Hi ,

Currently the lib support different databases (postgresql-sql, mysql , etc .. ) dialect only in the tokenizer.

My question is - if there was any discussions/plans about supporting per-db dialects also in the parser (or do you you think it's good idea to start such discussions )?

Where most of the DB's support ANSI sql , when it's come to more advance features like json-parsing - There is great variability between the different databases.

@nickolay
Copy link
Contributor

Hey, I wrote about this recently in #7 (comment)

@nickolay
Copy link
Contributor

If the question was specifically about discussing a change like this c67d667 then I was too quick to close this as a duplicate, as this is something we'll need eventually.

@eyalleshem
Copy link
Contributor Author

eyalleshem commented Jul 27, 2020

thanks @nikolay for the fast response :) ,

Follow your answer i follow issue #7 , and the issues that mentation there .

What i currently hope to do is to parse snowflake-json queries with this this lib, and I wonder what is the correct way for doing that.

As json-Sql query in snowflake could look not native in ANSI-sql perspective -so the first idea that came into my mind is dialect.
(something like you mention in: c67d667 ).

json sql in snowflake could look like :
SELECT b.this, a.V['is_dog'] FROM a1 a, table(FLATTEN(input => a.v)) as b where a.v:is_dog::String like 'true';

I could think also about some other approaches to deal with that - but maybe it's should be in another issue , with more suitable title :)

@nickolay nickolay changed the title Is there plans to support database's dialect also in the parser Dialect-specific parsing and Snowflake JSON support Jul 27, 2020
@nickolay
Copy link
Contributor

nickolay commented Jul 27, 2020

Right. This is a good example where different dialects conflict, as a.V['is_dog'] means a.V AS "'is_dog'" in MSSQL. It is because ['is_dog'] is a valid identifier (already configured in the dialect) and because MSSQL does not support [] for subscripts.

Something like c67d667 (i.e. the first of two options I listed in #207 (comment)) does seem to be a good solution for this, although there is a question when to use individual "toggles" like allow_omit_referenced_columns_in_column_option and if/when the parser should just branch on the dialect's identity.

I would be more comfortable with dialect-based switches, as that means N dialects to reason about instead of 2^N with the fine-grained toggles.

@Dandandan, @maxcountryman any thoughts on this?


The different ways to write subscripts do not seem to conflict syntactically at first glance:

...though there are probably issues around precedence, as I can't guess the grammar PostgreSQL uses just from its docs.

@maxcountryman
Copy link
Contributor

I agree that dialect-based switches seem preferable here. Unfortunately I can't think of a better way of handling these kinds of things.

@eyalleshem
Copy link
Contributor Author

eyalleshem commented Jul 27, 2020

Cool ,
Thanks for the comments.

I think there is still one issue about the dialect specific values in the AST .
(I think that what @nickolay tried to solved in #189)

For example currently in derived-table definition , there is no definition of "FLATTEN" or "table ,
(I refer to the query : SELECT b.this, a.V['is_dog'] FROM a1 a, table(FLATTEN(input => a.v)) as b where a.v:is_dog::String like 'true'; )

Currently I see 3 option:

  1. Don't represent this data on the AST (good enough for my needs and it problem should be handled later).
  2. Add flatten and table fields (and probably other specific-db type ) in the struct of the AST.
  3. Add an enum value that will include data type for specifics dbs.

The enum will like that :

pub enum DialectDerivedTable{
    Snowflake {
        flatten: bool, 
        table: bool 
    } 
}

and the derived table struct will look like that :

 Derived {
        lateral: bool,
        subquery: Box<Query>,
        alias: Option<TableAlias>,
        dialect_data: Option<DialectDerivedTable>, 
    },

For me option 3 look like the best solution .
WDYT ?
(or maybe you have some other ideas to handle that .. )

@nickolay
Copy link
Contributor

TABLE( ... ) actually seems to be a standard construct called <table function derived table>/<PTF derived table> in the ANSI spec. I think it should be implemented as a new TableFactor::TableFunction(Expr) variant.

The =>thing is an (apparently also standard) way to denote named arguments in function invocations (<named argument assignment token>).

FLATTEN may fit into a generic Expr::Function or require a specific variant, like EXTRACT. In the latter case I think it's fine to have it in Expr unconditionally.

@eyalleshem
Copy link
Contributor Author

thanks @nickolay for the explanation .

So I think that I am ready to start working on PR ,
will try to take in in baby-steps - for small review cycles .

My plan is to start with #223 -as it's suit for the dialect , and could also help others.
If it's will work well , I will continue other parts in the json syntax.

@nickolay
Copy link
Contributor

Cool! Actually, #223 is the least clear part of the puzzle to me (added a comment there), but I'll be glad to play along.

@eyalleshem
Copy link
Contributor Author

eyalleshem commented Jul 29, 2020

Ok , here my try ,
I will be happy for feedback

@eyalleshem
Copy link
Contributor Author

eyalleshem commented Jul 29, 2020

Opened also another PR for supporting named argument functions .
(FUN(a=>exp1, b=>exp2))

@tv42
Copy link

tv42 commented Mar 20, 2025

Almost 5 years without update. Can this be closed in light of https://github.com/apache/datafusion-sqlparser-rs/blob/main/tests/sqlparser_custom_dialect.rs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants