Skip to content

Commit 32c8771

Browse files
mjkravetzakpm00
authored andcommitted
hugetlb: do not clear hugetlb dtor until allocating vmemmap
Patch series "Fix hugetlb free path race with memory errors". In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on HWPOISON hugepages" the race window was discovered. https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/ Freeing a hugetlb page back to low level memory allocators is performed in two steps. 1) Under hugetlb lock, remove page from hugetlb lists and clear destructor 2) Outside lock, allocate vmemmap if necessary and call low level free Between these two steps, the hugetlb page will appear as a normal compound page. However, vmemmap for tail pages could be missing. If a memory error occurs at this time, we could try to update page flags non-existant page structs. A much more detailed description is in the first patch. The first patch addresses the race window. However, it adds a hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page free operation. This could lead to slowdowns if one is freeing a large number of hugetlb pages. The second path optimizes the update_and_free_pages_bulk routine to only take the lock once in bulk operations. The second patch is technically not a bug fix, but includes a Fixes tag and Cc stable to avoid a performance regression. It can be combined with the first, but was done separately make reviewing easier. This patch (of 2): Freeing a hugetlb page and releasing base pages back to the underlying allocator such as buddy or cma is performed in two steps: - remove_hugetlb_folio() is called to remove the folio from hugetlb lists, get a ref on the page and remove hugetlb destructor. This all must be done under the hugetlb lock. After this call, the page can be treated as a normal compound page or a collection of base size pages. - update_and_free_hugetlb_folio() is called to allocate vmemmap if needed and the free routine of the underlying allocator is called on the resulting page. We can not hold the hugetlb lock here. One issue with this scheme is that a memory error could occur between these two steps. In this case, the memory error handling code treats the old hugetlb page as a normal compound page or collection of base pages. It will then try to SetPageHWPoison(page) on the page with an error. If the page with error is a tail page without vmemmap, a write error will occur when trying to set the flag. Address this issue by modifying remove_hugetlb_folio() and update_and_free_hugetlb_folio() such that the hugetlb destructor is not cleared until after allocating vmemmap. Since clearing the destructor requires holding the hugetlb lock, the clearing is done in remove_hugetlb_folio() if the vmemmap is present. This saves a lock/unlock cycle. Otherwise, destructor is cleared in update_and_free_hugetlb_folio() after allocating vmemmap. Note that this will leave hugetlb pages in a state where they are marked free (by hugetlb specific page flag) and have a ref count. This is not a normal state. The only code that would notice is the memory error code, and it is set up to retry in such a case. A subsequent patch will create a routine to do bulk processing of vmemmap allocation. This will eliminate a lock/unlock cycle for each hugetlb page in the case where we are freeing a large number of pages. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: ad2fa37 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") Signed-off-by: Mike Kravetz <[email protected]> Reviewed-by: Muchun Song <[email protected]> Tested-by: Naoya Horiguchi <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: James Houghton <[email protected]> Cc: Jiaqi Yan <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent faeb2ff commit 32c8771

File tree

1 file changed

+51
-24
lines changed

1 file changed

+51
-24
lines changed

mm/hugetlb.c

Lines changed: 51 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1579,9 +1579,37 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
15791579
unsigned int order) { }
15801580
#endif
15811581

1582+
static inline void __clear_hugetlb_destructor(struct hstate *h,
1583+
struct folio *folio)
1584+
{
1585+
lockdep_assert_held(&hugetlb_lock);
1586+
1587+
/*
1588+
* Very subtle
1589+
*
1590+
* For non-gigantic pages set the destructor to the normal compound
1591+
* page dtor. This is needed in case someone takes an additional
1592+
* temporary ref to the page, and freeing is delayed until they drop
1593+
* their reference.
1594+
*
1595+
* For gigantic pages set the destructor to the null dtor. This
1596+
* destructor will never be called. Before freeing the gigantic
1597+
* page destroy_compound_gigantic_folio will turn the folio into a
1598+
* simple group of pages. After this the destructor does not
1599+
* apply.
1600+
*
1601+
*/
1602+
if (hstate_is_gigantic(h))
1603+
folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
1604+
else
1605+
folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
1606+
}
1607+
15821608
/*
1583-
* Remove hugetlb folio from lists, and update dtor so that the folio appears
1584-
* as just a compound page.
1609+
* Remove hugetlb folio from lists.
1610+
* If vmemmap exists for the folio, update dtor so that the folio appears
1611+
* as just a compound page. Otherwise, wait until after allocating vmemmap
1612+
* to update dtor.
15851613
*
15861614
* A reference is held on the folio, except in the case of demote.
15871615
*
@@ -1612,31 +1640,19 @@ static void __remove_hugetlb_folio(struct hstate *h, struct folio *folio,
16121640
}
16131641

16141642
/*
1615-
* Very subtle
1616-
*
1617-
* For non-gigantic pages set the destructor to the normal compound
1618-
* page dtor. This is needed in case someone takes an additional
1619-
* temporary ref to the page, and freeing is delayed until they drop
1620-
* their reference.
1621-
*
1622-
* For gigantic pages set the destructor to the null dtor. This
1623-
* destructor will never be called. Before freeing the gigantic
1624-
* page destroy_compound_gigantic_folio will turn the folio into a
1625-
* simple group of pages. After this the destructor does not
1626-
* apply.
1627-
*
1628-
* This handles the case where more than one ref is held when and
1629-
* after update_and_free_hugetlb_folio is called.
1630-
*
1631-
* In the case of demote we do not ref count the page as it will soon
1632-
* be turned into a page of smaller size.
1643+
* We can only clear the hugetlb destructor after allocating vmemmap
1644+
* pages. Otherwise, someone (memory error handling) may try to write
1645+
* to tail struct pages.
1646+
*/
1647+
if (!folio_test_hugetlb_vmemmap_optimized(folio))
1648+
__clear_hugetlb_destructor(h, folio);
1649+
1650+
/*
1651+
* In the case of demote we do not ref count the page as it will soon
1652+
* be turned into a page of smaller size.
16331653
*/
16341654
if (!demote)
16351655
folio_ref_unfreeze(folio, 1);
1636-
if (hstate_is_gigantic(h))
1637-
folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
1638-
else
1639-
folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
16401656

16411657
h->nr_huge_pages--;
16421658
h->nr_huge_pages_node[nid]--;
@@ -1705,6 +1721,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
17051721
{
17061722
int i;
17071723
struct page *subpage;
1724+
bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
17081725

17091726
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
17101727
return;
@@ -1735,6 +1752,16 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
17351752
if (unlikely(folio_test_hwpoison(folio)))
17361753
folio_clear_hugetlb_hwpoison(folio);
17371754

1755+
/*
1756+
* If vmemmap pages were allocated above, then we need to clear the
1757+
* hugetlb destructor under the hugetlb lock.
1758+
*/
1759+
if (clear_dtor) {
1760+
spin_lock_irq(&hugetlb_lock);
1761+
__clear_hugetlb_destructor(h, folio);
1762+
spin_unlock_irq(&hugetlb_lock);
1763+
}
1764+
17381765
for (i = 0; i < pages_per_huge_page(h); i++) {
17391766
subpage = folio_page(folio, i);
17401767
subpage->flags &= ~(1 << PG_locked | 1 << PG_error |

0 commit comments

Comments
 (0)