-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix Infer prepare statement type tests #15743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Infer prepare statement type tests #15743
Conversation
…pare clause on the infer types tests
Projection: person.id, person.age | ||
Filter: person.age BETWEEN $1 AND $2 | ||
TableScan: person | ||
Prepare: "my_plan" [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently when the logical plan is created it does not include the type inferred, this result in some issues when we try to call the function plan.with_param_values(param_values)
as it tries to extract the type from the plan but it is empty. Are we expecting the plan to contain the type at this time? if so probably there is another bug we need to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @qstommyshu any advice on where to look to debug why the plan is not properly generating the inferred type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @brayanjuls ,
the reason why this happen is there is no parameters being passed to the sql, but the sql itself contains positional arguments like $1
and $2
, please refer to https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.
The simple way you can fix it is by adding the position arguments type to the sql
like how it was done in test_prepare_statement_to_plan_one_param
. If you want to dig into the details about it, you can check how planner parse PREPARE
statement and transfer it into a logical plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but wouldn't that mean the type is not being inferred, but instead we are explicitly declaring it?
Thanks, I would check the planner parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check the details for parsing "PREPARE" statement yet, but I think the type inference behaviour depends on how the planner parser is implemented, it could be either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is a good improvment -- it updates the tests to show the current behavior
We can update the behavior in another PR, which is what I think you are working on in
Hi @brayanjuls, Thanks for the clever fix! For this issue, I suggest we keep our changes limited to the test file. Modifying datafusion/sql/src/statement.rs would constitute a user-facing API change—let’s open a new issue for that and address it in its own PR. @alamb What do you think? |
datafusion/sql/src/statement.rs
Outdated
} | ||
false => data_types, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will introduce explicitly declaring data types
to datafusion, maybe we should create a new issue/PR for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree making the behavior change in a separate PR might be better
Perhaps we could add a method on the Prepare
struct that would return the parameter types (or walk over the expressions in the plan calling infer types if the parameter types are not specified)
https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Prepare.html#structfield.data_types
So then the test would look like
#[test]
fn test_infer_types_from_join() {
let sql =
"SELECT id, order_id FROM person JOIN orders ON id = customer_id and age = $1";
let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Projection: person.id, orders.order_id
Inner Join: Filter: person.id = orders.customer_id AND person.age = $1
TableScan: person
TableScan: orders
"#
);
// NEW: verify the parameter types
let LogicalPlan::Prepare(prepare) = plan else {
panic!("Expected prepare, got {plan:?}");
};
// call new function to create or infer the datatypes
assert_eq!(prepare.get_or_infer_datatypes(), vec![DataType::Int32]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogicalPlan
variants other than Prepare
can have Expr::Placeholder
present, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback I really appreciate it, I will create a new issue and move the statement changes to a different PR. I will only put back the original function names and include the PREPARE
clause in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brayanjuls and @qstommyshu
datafusion/sql/src/statement.rs
Outdated
} | ||
false => data_types, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree making the behavior change in a separate PR might be better
Perhaps we could add a method on the Prepare
struct that would return the parameter types (or walk over the expressions in the plan calling infer types if the parameter types are not specified)
https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Prepare.html#structfield.data_types
So then the test would look like
#[test]
fn test_infer_types_from_join() {
let sql =
"SELECT id, order_id FROM person JOIN orders ON id = customer_id and age = $1";
let plan = logical_plan(sql).unwrap();
assert_snapshot!(
plan,
@r#"
Projection: person.id, orders.order_id
Inner Join: Filter: person.id = orders.customer_id AND person.age = $1
TableScan: person
TableScan: orders
"#
);
// NEW: verify the parameter types
let LogicalPlan::Prepare(prepare) = plan else {
panic!("Expected prepare, got {plan:?}");
};
// call new function to create or infer the datatypes
assert_eq!(prepare.get_or_infer_datatypes(), vec![DataType::Int32]);
Since @kczimm is working on this feature I think, maybe he has some thoughts about how this PR should be structured |
TableScan: person | ||
" | ||
), | ||
Err(err) => print!("error: {:?}", err), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If LogicalPlan::with_param_values
fails, we want this test to fail, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! I changed this just because I was debugging, this will fail if we do not declare or infer the types in the prepare
.
datafusion/sql/src/statement.rs
Outdated
} | ||
false => data_types, | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogicalPlan
variants other than Prepare
can have Expr::Placeholder
present, though.
datafusion/sql/src/statement.rs
Outdated
let data_types = match data_types.is_empty() { | ||
true => { | ||
let mut data_types: Vec<DataType> = plan | ||
.get_parameter_types()? | ||
.iter() | ||
.filter_map(|d| match d { | ||
(_, Some(v)) => Some(v.clone()), | ||
_ => None, | ||
}) | ||
.collect::<Vec<DataType>>(); | ||
data_types.sort(); | ||
planner_context.with_prepare_param_data_types(data_types.clone()); | ||
data_types | ||
} | ||
false => data_types, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make data_types
mutable above, you can do something like this which is a bit more terse:
let data_types = match data_types.is_empty() { | |
true => { | |
let mut data_types: Vec<DataType> = plan | |
.get_parameter_types()? | |
.iter() | |
.filter_map(|d| match d { | |
(_, Some(v)) => Some(v.clone()), | |
_ => None, | |
}) | |
.collect::<Vec<DataType>>(); | |
data_types.sort(); | |
planner_context.with_prepare_param_data_types(data_types.clone()); | |
data_types | |
} | |
false => data_types, | |
}; | |
if data_types.is_empty() { | |
for dt in plan.get_parameter_types()?.into_values().into_iter() { | |
if let Some(dt) = dt { | |
data_types.push(dt); | |
} | |
} | |
data_types.sort(); | |
planner_context.with_prepare_param_data_types(data_types.clone()); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, it looks simpler, thanks!
Co-authored-by: Andrew Lamb <[email protected]>
the infer data type changes in statement will be introduced in a new PR
datafusion/sql/src/statement.rs
Outdated
@@ -710,6 +710,7 @@ impl<S: ContextProvider> SqlToRel<'_, S> { | |||
*statement, | |||
&mut planner_context, | |||
)?; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this empty line
TableScan: person | ||
TableScan: orders | ||
" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I see your inferred type changes is in another issue. This part of the test shouldn't be removed in this issue, right? You can add new inferred type tests in #16019 along with your changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed them because without the inferred type changes the tests will fail because of the validation linked below, which happen after we call plan.with_param_values(param_values).unwrap();
in the tests, thats one of the reason I needed to introduce the inferred type changes.
datafusion/datafusion/common/src/param_value.rs
Lines 33 to 44 in 967384e
/// Verify parameter list length and type | |
pub fn verify(&self, expect: &[DataType]) -> Result<()> { | |
match self { | |
ParamValues::List(list) => { | |
// Verify if the number of params matches the number of values | |
if expect.len() != list.len() { | |
return _plan_err!( | |
"Expected {} parameters, got {}", | |
expect.len(), | |
list.len() | |
); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I see what you mean. This test can only be corrected once you set up the infer type feature. But the development flow would be weird if both this two PR addresses the same issue.
I think it would be better if we move everything to your new PR, so the new PR would include infer types
along with fixing/adding corresponding tests, and we only need to deal with one PR (closes #15577 and #16019). We can also merge this PR first then move on to #16019 if @alamb @kczimm prefers this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've no preference here.
Filter: person.age = Int32(10) | ||
TableScan: person | ||
" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for here
Filter: person.age BETWEEN Int32(10) AND Int32(30) | ||
TableScan: person | ||
" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
TableScan: person | ||
TableScan: person | ||
" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
Projection: column1 AS id, column2 AS first_name, column3 AS last_name, CAST(NULL AS Int32) AS age, CAST(NULL AS Utf8) AS state, CAST(NULL AS Float64) AS salary, CAST(NULL AS Timestamp(Nanosecond, None)) AS birth_date, CAST(NULL AS Int32) AS 😀 | ||
Values: (UInt32(1) AS $1, Utf8("Alan") AS $2, Utf8("Turing") AS $3) | ||
"# | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
Filter: person.id = UInt32(1) | ||
TableScan: person | ||
" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brayanjuls @qstommyshu and @kczimm
I reviewed this PR more carefully and now in retrospect I think the tests added in this PR (test_prepare_...
) are useful and valid (they cover PREPARE
)
However, I think the original tests are also useful as they cover type inference / parameter filling for queries that are not in a PREPARE statement (for example how some systems fill in bind parameters)
In order to avoid this PR from dragging on, I pushed a commit that
- Leave the original tests the way they are
- Add the new tests
Would someone be willing to make a PR to move the prepare
and infer
tests into a new module (sql_integration.rs at 5000 lines is pretty massive) so it is easier to reason about them? Perhaps into cases/parameters.rs
? I filed a ticket:
I also think it would be super helpful to refactor these tests now that we have the insta
framework to avoid some of the repetition. I filed
Projection: person.id, person.age | ||
Filter: person.age BETWEEN $1 AND $2 | ||
TableScan: person | ||
Prepare: "my_plan" [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is a good improvment -- it updates the tests to show the current behavior
We can update the behavior in another PR, which is what I think you are working on in
Thanks again @brayanjuls |
* draft commit to rolledback changes on function naming and include prepare clause on the infer types tests * include data types in plan when it is not included in the prepare statement * fix: prepare statement error * Update datafusion/sql/src/statement.rs Co-authored-by: Andrew Lamb <[email protected]> * remove infer types from prepare statement the infer data type changes in statement will be introduced in a new PR * fix to show correct output message * remove white space * Restore the original tests too --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Rationale for this change
This intent to correct issues on the prepare statements infer type tests. Currently they are not correctly asserting the expected behavior specially when we try to validate types.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No