Skip to content
/ server Public

WIP: MDEV-38728 Improve join size estimation for ref access#4608

Open
Olernov wants to merge 1 commit into11.4from
11.4-MDEV-38129-all-eq
Open

WIP: MDEV-38728 Improve join size estimation for ref access#4608
Olernov wants to merge 1 commit into11.4from
11.4-MDEV-38129-all-eq

Conversation

@Olernov
Copy link
Contributor

@Olernov Olernov commented Feb 2, 2026

When estimating number of rows produced by a join after ref access, the optimizer assumes all driving table values will find matches in the inner table. This causes overestimation when the driving table has more distinct values than the inner table's key.

Fix: use number of distinct values (NDV) for columns in the join predicate to calculate match probability:
match_prob = min(1.0, NDV(inner) / NDV(driving))
The expected number of records after ref access is then multiplied by match probability to provide more accurate estimate.

Limitations:

  • EITS must be available for both columns in the join predicate
  • both columns must be real table fields
  • only single-column ref access is supported
  • only first key part of the inner table's index is used

TODO:

  • WHERE filter on the driving table may reduce NDV and affect estimation. Currently, it is handled only basically (driving_ndv must be <= number of records of current partial join)

This commit overwrites only those test results which have been verified, i.e. provided better join size estimation. Other failing tests are not yet verified.

When estimating number of rows produced by a join after `ref` access,
the optimizer assumes all driving table values will find matches
in the inner table. This causes overestimation when the driving
table has more distinct values than the inner table's key.

Fix: use number of distinct values (NDV) for columns in the
join predicate to calculate match probability:
  match_prob = min(1.0, NDV(inner) / NDV(driving))
The expected number of records after `ref` access is then multiplied
by match probability to provide more accurate estimate.

Limitations:
- EITS must be available for both columns in the join predicate
- both columns must be real table fields
- only single-column ref access is supported
- only first key part of the inner table's index is used

TODO:
- WHERE filter on the driving table may reduce NDV and affect estimation.
  Currently, it is handled only basically
    (driving_ndv must be <= number of records of current partial join)

This commit overwrites only those test results which have been verified,
i.e. provided better join size estimation. Other failing tests are not
yet verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant