Skip to content

[SPARK-53066][SQL] Improve EXPLAIN output for DSv2 Join pushdown #51781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

PetarVasiljevic-DB
Copy link
Contributor

@PetarVasiljevic-DB PetarVasiljevic-DB commented Aug 1, 2025

What changes were proposed in this pull request?

Prior to this change, EXPLAIN FORMATTED on query e.g.:

SELECT *
FROM catalog.tbl1 t1
    JOIN catalog.tbl2 t2 ON t1.id1 = t2.id2
    JOIN catalog.tbl3 t3 ON t2.id2 = t3.id3
    JOIN catalog.tbl4 t4 ON t3.id3 = t4.id4;

looked like:

PushedJoins: [join_pushdown_catalog.tbl1, join_pushdown_catalog.tbl2, join_pushdown_catalog.tlb3, join_pushdown_catalog.tlb4]

With the change from PR, the output of EXPLAIN FORMATTED would be:

(1) Scan JDBC v1 Relation from v2 scan join_pushdown_catalog.JOIN_SCHEMA.JOIN_TABLE_1 [codegen id : 1]
Output [14]: [ID_c0e665d3_3cea_4fb8_b57d_6a9f26af360f#14, ID_03c3fe2f_6cd9_4794_ace9_85e617420548#17,  ID_ea300272_c11a_4c0c_aed8_a83a60f3da33#21, ID#24]
PushedFilters: [ID_ea300272_c11a_4c0c_aed8_a83a60f3da33 = (ID + 1)]
PushedJoins: 
[0]: [PushedFilters: [ID_03c3fe2f_6cd9_4794_ace9_85e617420548 = (ID + 1)],
    PushedJoins: [
    [0]: [PushedFilters: [ID_c0e665d3_3cea_4fb8_b57d_6a9f26af360f = (ID + 1)],
        PushedJoins: [
        [0]: [Relation: join_pushdown_catalog.tbl1, PushedFilters: [ID IS NOT NULL]],
        [1]: [Relation: join_pushdown_catalog.tbl2, PushedFilters: [ID IS NOT NULL]]
      ]],
    [1]: [Relation: join_pushdown_catalog.tbl13, PushedFilters: [ID IS NOT NULL]]
  ]],
[1]: [Relation: join_pushdown_catalog.tbl4, PushedFilters: [ID IS NOT NULL]]

PushedFilters on top of PushedJoins are actually join conditions.
It can be seen that the name of Scan JDBC v1 Relation from v2 scan is catalog.tbl1. This should be fixed as well, but it won't be a part of this PR.

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the SQL label Aug 1, 2025
@PetarVasiljevic-DB PetarVasiljevic-DB force-pushed the improve_explain_command_for_dsv2_join_pushdown branch 3 times, most recently from 5580c05 to 3563693 Compare August 1, 2025 17:36
@PetarVasiljevic-DB PetarVasiljevic-DB force-pushed the improve_explain_command_for_dsv2_join_pushdown branch from 3563693 to 897ce34 Compare August 1, 2025 18:03
@PetarVasiljevic-DB
Copy link
Contributor Author

cc: @cloud-fan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant