-
Notifications
You must be signed in to change notification settings - Fork 256
Allow BigSizedJoinIterator#buildPartitioner to produce more subparittions #12372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but better have more reviews from others.
And in the future, we may choose to repartition with an heuristic to calculate the proper partition number to overcome the skew case mentioned in the linked issue. It is something similar as the repartition in GPU hash aggregate or GPU sub hash join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hate this. I get that you might need a fix quickly, but this is not a long term solution. I like that the config is private so we can remove it in the future, but the only way I am willing to merge this in is if there is a follow on issue early in 25.06 that would find a way to deal with this case properly.
The heuristic being used currently is looking at the build side of the join to estimate how many output rows there would be for each input row on average, aka the amplification. This is not a perfect solution because it assumes that the distribution of the keys on the stream side matches that of the build side, which is not guaranteed.
Perhaps we can look at using a sketch to estimate the number of output rows for the equality portion of a join, and then assume that any non-equality parts are just going to reduce the number of output rows.
Doing a quick bit of research it looks like there are a lot of sketches we could look at using.
- Count-Min - https://postgrespro.com/list/thread-id/2556195 and https://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf
- Count - https://people.cs.umass.edu/~mcgregor/711S12/sketches1.pdf
- AGMS and Fast-AGMS - https://faculty.ucmerced.edu/frusu/Papers/Journal/2008-05-tods.pdf
- JoinSketch - https://yangtonghome.github.io/uploads/JoinSketch_2023.pdf
- Skimmed - https://minosng.github.io/Papers/edbt04skim-cam.pdf
Most of them look like something we could build on the GPU fairly quickly. Probably at least as fast as the distinct count we are doing today.
Offline talked with bobby, conclusions:
|
This PR is intended to address some corner cases from our customer. I fully understand that this is not a clean solution, but I'm also aware that we don't have a perfect dynamic solution to this issue in the short term (check #12354 (review) for more, we don't have clear roadmap on this yet). My past experience has shown that when users in production encounter unavoidable bugs, they prefer having some special configurations as an escape route rather than being stuck without options or forced to wait for a new release. That's why I'm introducing a new internal config in case they really need it (after consulting our experts). When we have finally finished the perfect dynamic solution we can of course remove such kind of internal configs, normal user will not be aware of this. Still, If there's strong resistance to the internal config, we can choose not to check in this PR. @revans2 @sameerz @GaryShen2008 @winningsix @abellina , or we can use the customer private repo for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @revans2. For now, if this helps a specific usecase we can merge, but it should be cleaned in 25.06.
This PR closes #12367 by introducing a new config called spark.rapids.sql.join.sizedJoin.buildPartitionNumberAmplification