Skip to content

Fix Infer prepare statement type tests #15743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 15, 2025

Conversation

brayanjuls
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This intent to correct issues on the prepare statements infer type tests. Currently they are not correctly asserting the expected behavior specially when we try to validate types.

What changes are included in this PR?

  • Restore the original naming.
  • Include the "PREPARE" statement clause without type.
  • Update insta snapshots.
  • Pending: Get the inferred types part of the plan.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@brayanjuls brayanjuls marked this pull request as draft April 17, 2025 03:06
@github-actions github-actions bot added the sql SQL Planner label Apr 17, 2025
Projection: person.id, person.age
Filter: person.age BETWEEN $1 AND $2
TableScan: person
Prepare: "my_plan" []
Copy link
Contributor Author

@brayanjuls brayanjuls Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when the logical plan is created it does not include the type inferred, this result in some issues when we try to call the function plan.with_param_values(param_values) as it tries to extract the type from the plan but it is empty. Are we expecting the plan to contain the type at this time? if so probably there is another bug we need to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @qstommyshu any advice on where to look to debug why the plan is not properly generating the inferred type?

Copy link
Contributor

@qstommyshu qstommyshu Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brayanjuls ,

the reason why this happen is there is no parameters being passed to the sql, but the sql itself contains positional arguments like $1 and $2, please refer to https://datafusion.apache.org/user-guide/sql/prepared_statements.html#inferred-types.

The simple way you can fix it is by adding the position arguments type to the sql like how it was done in test_prepare_statement_to_plan_one_param. If you want to dig into the details about it, you can check how planner parse PREPARE statement and transfer it into a logical plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't that mean the type is not being inferred, but instead we are explicitly declaring it?

Thanks, I would check the planner parser.

Copy link
Contributor

@qstommyshu qstommyshu Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check the details for parsing "PREPARE" statement yet, but I think the type inference behaviour depends on how the planner parser is implemented, it could be either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is a good improvment -- it updates the tests to show the current behavior

We can update the behavior in another PR, which is what I think you are working on in

@brayanjuls brayanjuls marked this pull request as ready for review May 2, 2025 13:32
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 6, 2025
@qstommyshu
Copy link
Contributor

Hi @brayanjuls,

Thanks for the clever fix! For this issue, I suggest we keep our changes limited to the test file. Modifying datafusion/sql/src/statement.rs would constitute a user-facing API change—let’s open a new issue for that and address it in its own PR.

@alamb What do you think?

}
false => data_types,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will introduce explicitly declaring data types to datafusion, maybe we should create a new issue/PR for this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree making the behavior change in a separate PR might be better

Perhaps we could add a method on the Prepare struct that would return the parameter types (or walk over the expressions in the plan calling infer types if the parameter types are not specified)

https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Prepare.html#structfield.data_types

So then the test would look like

#[test]
fn test_infer_types_from_join() {
    let sql =
        "SELECT id, order_id FROM person JOIN orders ON id = customer_id and age = $1";

    let plan = logical_plan(sql).unwrap();
    assert_snapshot!(
        plan,
        @r#"
    Projection: person.id, orders.order_id
      Inner Join:  Filter: person.id = orders.customer_id AND person.age = $1
        TableScan: person
        TableScan: orders
    "#
    );

    // NEW: verify the parameter types
    let LogicalPlan::Prepare(prepare) = plan else {
      panic!("Expected prepare, got {plan:?}");
    };

    // call new function to create or infer the datatypes
    assert_eq!(prepare.get_or_infer_datatypes(), vec![DataType::Int32]);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogicalPlan variants other than Prepare can have Expr::Placeholder present, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback I really appreciate it, I will create a new issue and move the statement changes to a different PR. I will only put back the original function names and include the PREPARE clause in this PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brayanjuls and @qstommyshu

}
false => data_types,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree making the behavior change in a separate PR might be better

Perhaps we could add a method on the Prepare struct that would return the parameter types (or walk over the expressions in the plan calling infer types if the parameter types are not specified)

https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Prepare.html#structfield.data_types

So then the test would look like

#[test]
fn test_infer_types_from_join() {
    let sql =
        "SELECT id, order_id FROM person JOIN orders ON id = customer_id and age = $1";

    let plan = logical_plan(sql).unwrap();
    assert_snapshot!(
        plan,
        @r#"
    Projection: person.id, orders.order_id
      Inner Join:  Filter: person.id = orders.customer_id AND person.age = $1
        TableScan: person
        TableScan: orders
    "#
    );

    // NEW: verify the parameter types
    let LogicalPlan::Prepare(prepare) = plan else {
      panic!("Expected prepare, got {plan:?}");
    };

    // call new function to create or infer the datatypes
    assert_eq!(prepare.get_or_infer_datatypes(), vec![DataType::Int32]);

@alamb
Copy link
Contributor

alamb commented May 7, 2025

Since @kczimm is working on this feature I think, maybe he has some thoughts about how this PR should be structured

TableScan: person
"
),
Err(err) => print!("error: {:?}", err),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If LogicalPlan::with_param_values fails, we want this test to fail, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I changed this just because I was debugging, this will fail if we do not declare or infer the types in the prepare.

}
false => data_types,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LogicalPlan variants other than Prepare can have Expr::Placeholder present, though.

Comment on lines 715 to 730
let data_types = match data_types.is_empty() {
true => {
let mut data_types: Vec<DataType> = plan
.get_parameter_types()?
.iter()
.filter_map(|d| match d {
(_, Some(v)) => Some(v.clone()),
_ => None,
})
.collect::<Vec<DataType>>();
data_types.sort();
planner_context.with_prepare_param_data_types(data_types.clone());
data_types
}
false => data_types,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make data_types mutable above, you can do something like this which is a bit more terse:

Suggested change
let data_types = match data_types.is_empty() {
true => {
let mut data_types: Vec<DataType> = plan
.get_parameter_types()?
.iter()
.filter_map(|d| match d {
(_, Some(v)) => Some(v.clone()),
_ => None,
})
.collect::<Vec<DataType>>();
data_types.sort();
planner_context.with_prepare_param_data_types(data_types.clone());
data_types
}
false => data_types,
};
if data_types.is_empty() {
for dt in plan.get_parameter_types()?.into_values().into_iter() {
if let Some(dt) = dt {
data_types.push(dt);
}
}
data_types.sort();
planner_context.with_prepare_param_data_types(data_types.clone());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, it looks simpler, thanks!

@@ -710,6 +710,7 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
*statement,
&mut planner_context,
)?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this empty line

TableScan: person
TableScan: orders
"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I see your inferred type changes is in another issue. This part of the test shouldn't be removed in this issue, right? You can add new inferred type tests in #16019 along with your changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them because without the inferred type changes the tests will fail because of the validation linked below, which happen after we call plan.with_param_values(param_values).unwrap(); in the tests, thats one of the reason I needed to introduce the inferred type changes.

/// Verify parameter list length and type
pub fn verify(&self, expect: &[DataType]) -> Result<()> {
match self {
ParamValues::List(list) => {
// Verify if the number of params matches the number of values
if expect.len() != list.len() {
return _plan_err!(
"Expected {} parameters, got {}",
expect.len(),
list.len()
);
}

Copy link
Contributor

@qstommyshu qstommyshu May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I see what you mean. This test can only be corrected once you set up the infer type feature. But the development flow would be weird if both this two PR addresses the same issue.

I think it would be better if we move everything to your new PR, so the new PR would include infer types along with fixing/adding corresponding tests, and we only need to deal with one PR (closes #15577 and #16019). We can also merge this PR first then move on to #16019 if @alamb @kczimm prefers this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've no preference here.

Filter: person.age = Int32(10)
TableScan: person
"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for here

Filter: person.age BETWEEN Int32(10) AND Int32(30)
TableScan: person
"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

TableScan: person
TableScan: person
"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Projection: column1 AS id, column2 AS first_name, column3 AS last_name, CAST(NULL AS Int32) AS age, CAST(NULL AS Utf8) AS state, CAST(NULL AS Float64) AS salary, CAST(NULL AS Timestamp(Nanosecond, None)) AS birth_date, CAST(NULL AS Int32) AS 😀
Values: (UInt32(1) AS $1, Utf8("Alan") AS $2, Utf8("Turing") AS $3)
"#
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Filter: person.id = UInt32(1)
TableScan: person
"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @brayanjuls @qstommyshu and @kczimm

I reviewed this PR more carefully and now in retrospect I think the tests added in this PR (test_prepare_...) are useful and valid (they cover PREPARE)

However, I think the original tests are also useful as they cover type inference / parameter filling for queries that are not in a PREPARE statement (for example how some systems fill in bind parameters)

In order to avoid this PR from dragging on, I pushed a commit that

  1. Leave the original tests the way they are
  2. Add the new tests

Would someone be willing to make a PR to move the prepare and infer tests into a new module (sql_integration.rs at 5000 lines is pretty massive) so it is easier to reason about them? Perhaps into cases/parameters.rs? I filed a ticket:

I also think it would be super helpful to refactor these tests now that we have the insta framework to avoid some of the repetition. I filed

Projection: person.id, person.age
Filter: person.age BETWEEN $1 AND $2
TableScan: person
Prepare: "my_plan" []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is a good improvment -- it updates the tests to show the current behavior

We can update the behavior in another PR, which is what I think you are working on in

@alamb alamb merged commit 51452d6 into apache:main May 15, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented May 15, 2025

Thanks again @brayanjuls

jfahne pushed a commit to jfahne/datafusion that referenced this pull request May 18, 2025
* draft commit to rolledback changes on function naming and include prepare clause on the infer types tests

* include data types in plan when it is not included in the prepare statement

* fix: prepare statement error

* Update datafusion/sql/src/statement.rs

Co-authored-by: Andrew Lamb <[email protected]>

* remove infer types from prepare statement

the infer data type changes in statement will be introduced in a new PR

* fix to show correct output message

* remove white space

* Restore the original tests too

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix PREPARE statement tests
4 participants