-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add dedicated mv_overlap, mv_contains functions. #18084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces new multi-value string functions at the native layer. These are now used for the SQL `MV_OVERLAP` and `MV_CONTAINS` functions. The main purpose of the new functions is to have slightly different null and empty-array handling behavior vs `array_overlap` and `array_contains`. Behavior is designed to align with the native `inType` filter, which is especially important for `MV_OVERLAP`, because the SQL layer can convert `MV_OVERLAP` into an `inType` filter. The SQL layer no longer generates calls to the `mv_harmonize_nulls` function, because that logic has been moved inside `mv_overlap` and `mv_contains`. The function still remains at the native layer for backwards compatibility.
d1cef64
to
e7f9456
Compare
), | ||
ImmutableList.of( | ||
new Object[]{"[\"a\",\"b\"]"}, | ||
new Object[]{"[\"b\",\"c\"]"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suppose this test should have also selected dim2 to make it easier to verify heh, though I'm still kind of confused, what does the array constructor end up doing to an empty mvd row? I started to try to find out but then ran into a wierd problem with the grammar, you can't actually write array(array(...))
expression because it gets confused with some other junk in there about array literal parsing.. will look into both of these things later and just assume this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually was a bad change. The root cause was:
- the
array(dim2)
function, when passed adim2
that is a single null, would return anarray<long>
typed array with a single null in it. - the
mv_overlap
function would then cast the first arg toarray<long>
, which would yield an array of nulls. - the
mv_overlap
function would then return true because an array of nulls (first arg) does contain a null (second arg).
To fix it we'd need to cast for comparison. But I think that is sort of pointless since MV_CONTAINS
and MV_OVERLAP
really only make sense with strings. So instead, I updated them to cast the second arg to string array, and updated the tests accordingly.
|
||
@Override | ||
protected String getFilterExpression(List<DruidExpression> druidExpressions) | ||
public String getDruidFunctionName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this still extend ArrayContainsOperatorConversion? I guess i see the benefit of overlapping filter conversion, but given that it uses some other native filters, such as ArrayContainsElementFilter
, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for extending ArrayContainsOperatorConversion
is to get the filter conversion behavior. This patch doesn't change whether ArrayContainsElementFilter
could fire for MV_CONTAINS
, but just to be safe, I added a check for this. (If the operator is not ARRAY_CONTAINS
then it skips the array-element-filter block.)
newArgs | ||
); | ||
} | ||
druidExpressions -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like the default implementation, so can just delete i think, unless i'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, will delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sorry for confusion, i meant of the entire toDruidExpression
method override can be dropped i think, since this is effectively the same as DirectOperatorConversion
implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had deleted the other one, not this one you commented on. Just deleted this one too. Actually, I did a bigger change:
- updated MV Contains and Overlap to call a superclass constructor with their operator and function names, rather than overriding
calciteOperator()
andgetDruidFunctionName()
- removed
toDruidExpression
andtoDruidExpressionWithPostAggOperands
from MV Contains and Overlap - marked
calciteOperator()
andgetDruidFunctionName()
asfinal
inDirectOperatorConversion
, so all the other methods onDirectOperatorConversion
can be trusted to use the correct operator and function name
} | ||
|
||
@Test | ||
public void testMvContains() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests for what happens if other types of columns make it in here, like scalar numbers and number arrays and stuff? i think should be fine, just might be nice to have.. see there are tests for scalar strings which is 👍 just would like to see a bit more (and also maybe as bindings since different paths for singleThreaded for constant and dynamic args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some tests. I didn't use bindings, but I don't think they're needed, because assertExpr
tests both with and without single-threaded mode. Using constants should let both modes be exercised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't use bindings, but I don't think they're needed, because assertExpr tests both with and without single-threaded mode.
oops yea, i forgot it did that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, though I didn't like validate the quidem test changes.. just going to trust that the changes there are correct
I think the quidem tests are OK- looking through them, they appear to mainly be differences in how null/empty cases are handled. |
Introduces new multi-value string functions at the native layer. These are now used for the SQL
MV_OVERLAP
andMV_CONTAINS
functions. The main purpose of the new functions is to have slightly different null and empty-array handling behavior vsarray_overlap
andarray_contains
. Behavior is designed to align with the nativeinType
filter, which is especially important forMV_OVERLAP
, because the SQL layer can convertMV_OVERLAP
into aninType
filter.The SQL layer no longer generates calls to the
mv_harmonize_nulls
function, because that logic has been moved insidemv_overlap
andmv_contains
. The function still remains at the native layer for backwards compatibility.