MDEV-32067 InnoDB linear read ahead had better be logical#4600
MDEV-32067 InnoDB linear read ahead had better be logical#4600Thirunarayanan wants to merge 1 commit intomainfrom
Conversation
|
|
1fc14cb to
7fa9c70
Compare
The traditional linear read-ahead, enabled by innodb_read_ahead_threshold=56, only works if pages are allocated on adjacent page numbers, which is not always the case for B-tree leaf pages. After this change, the exact nonzero values of innodb_read_ahead_threshold matter only for the read-ahead of undo log pages. Introduced Multi-Range Read (MRR) aware read-ahead that collects actual leaf page numbers during B-tree traversal buf_read_ahead_undo(): Renamed from buf_read_ahead_linear(). This function will no longer be invoked on any BLOB pages (for which FIL_PAGE_PREV and FIL_PAGE_NEXT were not initialized consistently) nor on any index pages. For index leaf pages, we will introduce buf_read_ahead_one() and buf_read_ahead_pages(). buf_read_ahead_one(): Read ahead one (sibling leaf) page. This logic cannot be disabled. buf_read_ahead_pages(): Read ahead B-tree index leaf pages. buf_read_ahead_random(): Split the function into two parts: one that determines which range of pages should be read, and another that actually initiates a read of the pages. btr_pcur_move_to_next_page(): Invoke buf_read_ahead_one() instead of buf_read_ahead_linear(). btr_pcur_move_backward_from_page(): Implement a fast path of trying to acquire a latch on the previous page without waiting, and invoke buf_read_ahead_one() on the preceding page, with the assumption that we may be accessing that page in the near future. btr_copy_blob_prefix(): Simplify the logic. On other than ROW_FORMAT=COMPRESSED BLOB pages, the FIL_PAGE_NEXT field is not meaningfully initialized. The FIL_PAGE_PREV field is not pointing to anything meaningful either. buf_read_ahead_linear() expects these to be set meaningfully. Only the non-default setting innodb_random_read_ahead=ON might be meaningful here. btr_cur_t::search_leaf(): Add MRR read-ahead context to collect leaf page numbers at PAGE_LEVEL=1 during B-tree traversal. The collected page numbers represent actual leaf pages that will be accessed, enabling more targeted read-ahead than linear page number assumptions. mrr_readahead_ctx_t: New structure for passing MRR context through the call chain from ha_innobase -> row_search_mvcc() -> btr_pcur_open() -> search_leaf() and it has READ_AHEAD_PAGES=64 limit.
7fa9c70 to
93d5f7d
Compare
dr-m
left a comment
There was a problem hiding this comment.
I only reviewed a small part of this so far. Please debug this with undo tablespace truncation enabled.
| const unsigned zip_size= space->zip_size(); | ||
| ulint count; | ||
|
|
||
| if (high_1.page_no() > space->last_page_number()) |
There was a problem hiding this comment.
We seem to have a race condition with undo tablespace truncation here. I understood that mtr_t::commit_shrink() would change space->committed_size as part of mtr_memo_slot_t::release(). There, that field is protected by exclusive space->latch. Here, we are not holding that latch, hence we could theoretically post a read-ahead request for a portion of an undo tablespace that is being truncated.
Acquiring space->latch here would lead to a significant performance regression. We will need to prevent this glitch in a different way. A possible way might be to set the STOPPING_READS flag in the mtr_t::commit_shrink() code path and clearing it after release(). This would allow space.acquire() to return false and therefore prevent us from entering this code path.
MDEV-32067 InnoDB linear read ahead had better be logical