-
Notifications
You must be signed in to change notification settings - Fork 605
Implement Hive QL Parsing #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 537958138Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
src/ast/mod.rs
Outdated
@@ -627,6 +753,25 @@ impl fmt::Display for Statement { | |||
} | |||
Ok(()) | |||
} | |||
Statement::CreateDatabase { | |||
db_name, | |||
ine, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if_not_exists
, better to write it out without abbreviations
src/ast/data_type.rs
Outdated
@@ -61,6 +61,8 @@ pub enum DataType { | |||
Regclass, | |||
/// Text | |||
Text, | |||
/// String (Hive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some other dialects besides hive use string
} => { | ||
write!(f, "TRUNCATE TABLE {}", table_name)?; | ||
if let Some(ref parts) = partitions { | ||
if !parts.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PARTITIONS ()
also possible? If so, then this check is not needed. Otherwise, we could check the partitions type to a vec instead of optional vec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe PARTITIONS ()
is valid, you have to specify at the very least a column.
src/ast/mod.rs
Outdated
Msck { | ||
table_name: ObjectName, | ||
repair: bool, | ||
add_partitions: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add/drop/sync could be an enum I think? Also simplifies the parser code a bit and allows for easier patter matching.
Looks great, would love to have this! I added a few comments |
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
pub enum PartitionAction { | ||
ADD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change naming to Add
, Drop
, Sync
?
…r INSERT INTO TABLE
…o allow alises starting with numbers
So @Dandandan I ran into a discussion point here. Hive allows you to have column aliases beginning with numbers such as |
# Conflicts: # examples/cli.rs # src/ast/mod.rs # src/ast/query.rs # src/parser.rs # src/test_utils.rs # src/tokenizer.rs # tests/sqlparser_common.rs # tests/sqlparser_hive.rs
@hntd187 @Dandandan Seems like we should go ahead and merge this now? I will skim through and merge if there are no objections. |
@@ -22,15 +22,17 @@ use std::fmt; | |||
pub enum Value { | |||
/// Numeric literal | |||
#[cfg(not(feature = "bigdecimal"))] | |||
Number(String), | |||
Number(String, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if a new type should have been added rather than change this type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -35,7 +35,7 @@ pub enum Token { | |||
/// A keyword (like SELECT) or an optionally quoted SQL identifier | |||
Word(Word), | |||
/// An unsigned numeric literal | |||
Number(String), | |||
Number(String, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be better extracted to a named field, e.g. long: bool
?.
This helps readability
@@ -22,15 +22,17 @@ use std::fmt; | |||
pub enum Value { | |||
/// Numeric literal | |||
#[cfg(not(feature = "bigdecimal"))] | |||
Number(String), | |||
Number(String, bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those values as well
@hntd187 @andygrove |
Hey y'all, I'm writing a utility that indexes SQL queries using this library as a parser. I just bumped into this: 9d9d681#diff-4a04259da480a6b794a2e947e4cc03eff4d1aa9330836f5b91cac68c5398193fR2182-L2173 The support for HiveQL's use of a Here's an example query:
Could you please link documentation that explains what The feature doesn't seem to be explained in the HiveQL docs: It demonstrates its use in:
But doesn't explain what the clause is actually doing. I don't have a Hive instance to test right now but I'd expect this to work as well:
Similarly from https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression
Could someone please tell me what's going on here? |
This is a highly experimental start to implementing Hive QL in the parser. Luckily, the parser's design lends itself well to adding in some of these things, but there are still a few oddities and things to implement to make something for Hive complete. I'd like to get people reviewing this early because in my opinion I do some janky things here I should clean up and I'm hoping early review will help me nip that in the bud.
Also, my tests are a bit on the light side right now, but I have verified a lot of this functionality with some internal HQL that I cannot share publicly. I am confident that this covers the majority of edge cases on how not to write HQL.