-
Notifications
You must be signed in to change notification settings - Fork 956
Sort-based inner join for high-multiplicity tables #18318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test |
/ok to test |
/ok to test eaa76fa |
/ok to test a083c71 |
merge_inner_join(cudf::table_view const& left_keys, | ||
cudf::table_view const& right_keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we match the parameter names as in sort_merge_join
? That means, using left
and right
without _keys
. I don't see any values
tables thus we don't have to add such suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retained the _keys
suffix for consistency - it matches the join declarations in join/join.hpp
cudf/cpp/include/cudf/join/join.hpp
Line 71 in 92c746c
inner_join(cudf::table_view const& left_keys, |
struct unprocessed_table_mapper { | ||
bitmask_type const* const _validity_mask; | ||
__device__ auto operator()(size_type idx) const noexcept | ||
{ | ||
return cudf::bit_is_set(_validity_mask, idx); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is used only once. So can we just use a lambda in its caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since validity mask is a member of a private nested struct of the sort merge join class, we run into the following compile error when we use a device lambda instead of a free functor: enclosing parent function for an extended device lambda cannot have private or protected access
.
/ok to test 57b4d3f |
/ok to test 4ed9c9e |
/ok to test e9d0612 |
/merge |
Description
Contributes to #18533
Addresses performance hotspots outlined in #16025
This PR introduces a sort-based approach for inner joins on low-cardinality high-multiplicity tables i.e. tables that have few unique keys each of which is repeated several times.
Sort-merge join implemetation:
Progress
TODO:
Checklist