Skip to content

Commit 4750df6

Browse files
fdmananakdave
authored andcommitted
btrfs: release path before initializing extent tree in btrfs_read_locked_inode()
In btrfs_read_locked_inode() we are calling btrfs_init_file_extent_tree() while holding a path with a read locked leaf from a subvolume tree, and btrfs_init_file_extent_tree() may do a GFP_KERNEL allocation, which can trigger reclaim. This can create a circular lock dependency which lockdep warns about with the following splat: [27386.164433] ====================================================== [27386.164574] WARNING: possible circular locking dependency detected [27386.164583] 6.18.0+ #4 Tainted: G U [27386.164591] ------------------------------------------------------ [27386.164599] kswapd0/117 is trying to acquire lock: [27386.164606] ffff8d9b6333c5b8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.164625] but task is already holding lock: [27386.164633] ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [27386.164646] which lock already depends on the new lock. [27386.164657] the existing dependency chain (in reverse order) is: [27386.164667] -> #2 (fs_reclaim){+.+.}-{0:0}: [27386.164677] fs_reclaim_acquire+0x9d/0xd0 [27386.164685] __kmalloc_cache_noprof+0x59/0x750 [27386.164694] btrfs_init_file_extent_tree+0x90/0x100 [27386.164702] btrfs_read_locked_inode+0xc3/0x6b0 [27386.164710] btrfs_iget+0xbb/0xf0 [27386.164716] btrfs_lookup_dentry+0x3c5/0x8e0 [27386.164724] btrfs_lookup+0x12/0x30 [27386.164731] lookup_open.isra.0+0x1aa/0x6a0 [27386.164739] path_openat+0x5f7/0xc60 [27386.164746] do_filp_open+0xd6/0x180 [27386.164753] do_sys_openat2+0x8b/0xe0 [27386.164760] __x64_sys_openat+0x54/0xa0 [27386.164768] do_syscall_64+0x97/0x3e0 [27386.164776] entry_SYSCALL_64_after_hwframe+0x76/0x7e [27386.164784] -> #1 (btrfs-tree-00){++++}-{3:3}: [27386.164794] lock_release+0x127/0x2a0 [27386.164801] up_read+0x1b/0x30 [27386.164808] btrfs_search_slot+0x8e0/0xff0 [27386.164817] btrfs_lookup_inode+0x52/0xd0 [27386.164825] __btrfs_update_delayed_inode+0x73/0x520 [27386.164833] btrfs_commit_inode_delayed_inode+0x11a/0x120 [27386.164842] btrfs_log_inode+0x608/0x1aa0 [27386.164849] btrfs_log_inode_parent+0x249/0xf80 [27386.164857] btrfs_log_dentry_safe+0x3e/0x60 [27386.164865] btrfs_sync_file+0x431/0x690 [27386.164872] do_fsync+0x39/0x80 [27386.164879] __x64_sys_fsync+0x13/0x20 [27386.164887] do_syscall_64+0x97/0x3e0 [27386.164894] entry_SYSCALL_64_after_hwframe+0x76/0x7e [27386.164903] -> #0 (&delayed_node->mutex){+.+.}-{3:3}: [27386.164913] __lock_acquire+0x15e9/0x2820 [27386.164920] lock_acquire+0xc9/0x2d0 [27386.164927] __mutex_lock+0xcc/0x10a0 [27386.164934] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.164944] btrfs_evict_inode+0x20b/0x4b0 [27386.164952] evict+0x15a/0x2f0 [27386.164958] prune_icache_sb+0x91/0xd0 [27386.164966] super_cache_scan+0x150/0x1d0 [27386.164974] do_shrink_slab+0x155/0x6f0 [27386.164981] shrink_slab+0x48e/0x890 [27386.164988] shrink_one+0x11a/0x1f0 [27386.164995] shrink_node+0xbfd/0x1320 [27386.165002] balance_pgdat+0x67f/0xc60 [27386.165321] kswapd+0x1dc/0x3e0 [27386.165643] kthread+0xff/0x240 [27386.165965] ret_from_fork+0x223/0x280 [27386.166287] ret_from_fork_asm+0x1a/0x30 [27386.166616] other info that might help us debug this: [27386.167561] Chain exists of: &delayed_node->mutex --> btrfs-tree-00 --> fs_reclaim [27386.168503] Possible unsafe locking scenario: [27386.169110] CPU0 CPU1 [27386.169411] ---- ---- [27386.169707] lock(fs_reclaim); [27386.169998] lock(btrfs-tree-00); [27386.170291] lock(fs_reclaim); [27386.170581] lock(&delayed_node->mutex); [27386.170874] *** DEADLOCK *** [27386.171716] 2 locks held by kswapd0/117: [27386.171999] #0: ffffffffa4ab8ce0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x195/0xc60 [27386.172294] #1: ffff8d998344b0e0 (&type->s_umount_key#40){++++}- {3:3}, at: super_cache_scan+0x37/0x1d0 [27386.172596] stack backtrace: [27386.173183] CPU: 11 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.18.0+ #4 PREEMPT(lazy) [27386.173185] Tainted: [U]=USER [27386.173186] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [27386.173187] Call Trace: [27386.173187] <TASK> [27386.173189] dump_stack_lvl+0x6e/0xa0 [27386.173192] print_circular_bug.cold+0x17a/0x1c0 [27386.173194] check_noncircular+0x175/0x190 [27386.173197] __lock_acquire+0x15e9/0x2820 [27386.173200] lock_acquire+0xc9/0x2d0 [27386.173201] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.173204] __mutex_lock+0xcc/0x10a0 [27386.173206] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.173208] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.173211] ? __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.173213] __btrfs_release_delayed_node.part.0+0x39/0x2f0 [27386.173215] btrfs_evict_inode+0x20b/0x4b0 [27386.173217] ? lock_acquire+0xc9/0x2d0 [27386.173220] evict+0x15a/0x2f0 [27386.173222] prune_icache_sb+0x91/0xd0 [27386.173224] super_cache_scan+0x150/0x1d0 [27386.173226] do_shrink_slab+0x155/0x6f0 [27386.173228] shrink_slab+0x48e/0x890 [27386.173229] ? shrink_slab+0x2d2/0x890 [27386.173231] shrink_one+0x11a/0x1f0 [27386.173234] shrink_node+0xbfd/0x1320 [27386.173236] ? shrink_node+0xa2d/0x1320 [27386.173236] ? shrink_node+0xbd3/0x1320 [27386.173239] ? balance_pgdat+0x67f/0xc60 [27386.173239] balance_pgdat+0x67f/0xc60 [27386.173241] ? finish_task_switch.isra.0+0xc4/0x2a0 [27386.173246] kswapd+0x1dc/0x3e0 [27386.173247] ? __pfx_autoremove_wake_function+0x10/0x10 [27386.173249] ? __pfx_kswapd+0x10/0x10 [27386.173250] kthread+0xff/0x240 [27386.173251] ? __pfx_kthread+0x10/0x10 [27386.173253] ret_from_fork+0x223/0x280 [27386.173255] ? __pfx_kthread+0x10/0x10 [27386.173257] ret_from_fork_asm+0x1a/0x30 [27386.173260] </TASK> This is because: 1) The fsync task is holding an inode's delayed node mutex (for a directory) while calling __btrfs_update_delayed_inode() and that needs to do a search on the subvolume's btree (therefore read lock some extent buffers); 2) The lookup task, at btrfs_lookup(), triggered reclaim with the GFP_KERNEL allocation done by btrfs_init_file_extent_tree() while holding a read lock on a subvolume leaf; 3) The reclaim triggered kswapd which is doing inode eviction for the directory inode the fsync task is using as an argument to btrfs_commit_inode_delayed_inode() - but in that call chain we are trying to read lock the same leaf that the lookup task is holding while calling btrfs_init_file_extent_tree() and doing the GFP_KERNEL allocation. Fix this by calling btrfs_init_file_extent_tree() after we don't need the path anymore and release it in btrfs_read_locked_inode(). Reported-by: Thomas Hellström <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: 8679d26 ("btrfs: initialize inode::file_extent_tree after i_mode has been set") Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent 8d712f8 commit 4750df6

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

fs/btrfs/inode.c

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4043,11 +4043,6 @@ static int btrfs_read_locked_inode(struct btrfs_inode *inode, struct btrfs_path
40434043
btrfs_set_inode_mapping_order(inode);
40444044

40454045
cache_index:
4046-
ret = btrfs_init_file_extent_tree(inode);
4047-
if (ret)
4048-
goto out;
4049-
btrfs_inode_set_file_extent_range(inode, 0,
4050-
round_up(i_size_read(vfs_inode), fs_info->sectorsize));
40514046
/*
40524047
* If we were modified in the current generation and evicted from memory
40534048
* and then re-read we need to do a full sync since we don't have any
@@ -4134,6 +4129,20 @@ static int btrfs_read_locked_inode(struct btrfs_inode *inode, struct btrfs_path
41344129
btrfs_ino(inode), btrfs_root_id(root), ret);
41354130
}
41364131

4132+
/*
4133+
* We don't need the path anymore, so release it to avoid holding a read
4134+
* lock on a leaf while calling btrfs_init_file_extent_tree(), which can
4135+
* allocate memory that triggers reclaim (GFP_KERNEL) and cause a locking
4136+
* dependency.
4137+
*/
4138+
btrfs_release_path(path);
4139+
4140+
ret = btrfs_init_file_extent_tree(inode);
4141+
if (ret)
4142+
goto out;
4143+
btrfs_inode_set_file_extent_range(inode, 0,
4144+
round_up(i_size_read(vfs_inode), fs_info->sectorsize));
4145+
41374146
if (!maybe_acls)
41384147
cache_no_acl(vfs_inode);
41394148

0 commit comments

Comments
 (0)