Open
Description
System information
Type | Version/Name |
---|---|
Distribution Name | Debian |
Distribution Version | Stretch |
Linux Kernel | 4.17.0-0.bpo.3-amd64 |
Architecture | amd64 |
ZFS Version | 0.7.9-3~bpo9+1 |
SPL Version | 0.7.9-3~bpo9+1 |
Describe the problem you're observing
Three-way deadlock between z_iput, sync(), and fsync()
Describe how to reproduce the problem
Not sure - it just happens after a few weeks on an AFS fileserver with 4-spindle raidz
I previously had to disable swap due to #7734
There are no hardware errors logged and nothing apparently wrong with the disks. I can continue to use the pool but certain operations will hang in uninterruptible sleep until the system is rebooted.
Include any warning/errors/backtraces from the system logs
[1378261.919390] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378261.924563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378261.929657] z_iput D 0 348 2 0x80000000
[1378261.934686] Call Trace:
[1378261.939606] ? __schedule+0x3dc/0x860
[1378261.944430] ? __switch_to_asm+0x34/0x70
[1378261.949234] ? __switch_to_asm+0x40/0x70
[1378261.953852] ? bit_wait_io_timeout+0x90/0x90
[1378261.958381] schedule+0x32/0x80
[1378261.962864] bit_wait+0xd/0x50
[1378261.967328] __wait_on_bit+0x55/0x80
[1378261.971814] ? __switch_to+0x175/0x450
[1378261.976315] ? bit_wait_io_timeout+0x90/0x90
[1378261.980768] __inode_wait_for_writeback+0xa7/0xe0
[1378261.985272] ? init_wait_var_entry+0x40/0x40
[1378261.989790] inode_wait_for_writeback+0x21/0x30
[1378261.994288] evict+0xab/0x1c0
[1378261.998756] taskq_thread+0x2d8/0x500 [spl]
[1378262.003184] ? wake_up_q+0x70/0x70
[1378262.007632] kthread+0xf8/0x130
[1378262.012013] ? taskq_thread_spawn+0x50/0x50 [spl]
[1378262.016393] ? kthread_create_worker_on_cpu+0x70/0x70
[1378262.020772] ret_from_fork+0x22/0x40
[1378262.025144] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378262.029447] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378262.033705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378262.037912] HWRaidManager D 0 29978 1 0x20020080
[1378262.042067] Call Trace:
[1378262.046099] ? __schedule+0x3dc/0x860
[1378262.050046] ? default_file_splice_write+0x20/0x20
[1378262.053890] schedule+0x32/0x80
[1378262.057654] wb_wait_for_completion+0x5e/0x90
[1378262.061335] ? remove_wait_queue+0x60/0x60
[1378262.064932] sync_inodes_sb+0xa7/0x280
[1378262.068438] ? __queue_work+0x15e/0x410
[1378262.071847] ? default_file_splice_write+0x20/0x20
[1378262.075180] iterate_supers+0xb5/0x100
[1378262.078417] ksys_sync+0x40/0xb0
[1378262.081593] __ia32_sys_sync+0xa/0x10
[1378262.084747] do_fast_syscall_32+0x98/0x1e0
[1378262.087866] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[1378262.090938] INFO: task vlserver:18469 blocked for more than 120 seconds.
[1378262.093961] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378262.097022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378262.100142] vlserver D 0 18469 18467 0x00000080
[1378262.103303] Call Trace:
[1378262.106405] ? __schedule+0x3dc/0x860
[1378262.109524] schedule+0x32/0x80
[1378262.112687] cv_wait_common+0x115/0x130 [spl]
[1378262.115875] ? remove_wait_queue+0x60/0x60
[1378262.119227] zil_commit.part.13+0x86/0x8c0 [zfs]
[1378262.122464] ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378262.125708] ? tsd_set+0x2db/0x480 [spl]
[1378262.128975] ? _cond_resched+0x16/0x40
[1378262.132376] zfs_fsync+0x70/0xe0 [zfs]
[1378262.135798] zpl_fsync+0x63/0x90 [zfs]
[1378262.138987] do_fsync+0x38/0x60
[1378262.142092] __x64_sys_fsync+0x10/0x20
[1378262.145091] do_syscall_64+0x55/0x110
[1378262.148037] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378262.150909] RIP: 0033:0x7fdf15610560
[1378262.153769] RSP: 002b:00007ffddb903ad8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[1378262.156735] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fdf15610560
[1378262.159688] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[1378262.162604] RBP: 0000559976687a50 R08: 000000000000000c R09: 00007ffddb903a20
[1378262.165473] R10: 000055997532e220 R11: 0000000000000246 R12: 0000559976687a40
[1378262.168331] R13: 00007ffddb903cf8 R14: 00000000ffffff9c R15: 0000559976687a40
[1378382.755489] INFO: task z_iput:348 blocked for more than 120 seconds.
[1378382.758346] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.761291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.764291] z_iput D 0 348 2 0x80000000
[1378382.767307] Call Trace:
[1378382.770310] ? __schedule+0x3dc/0x860
[1378382.773326] ? __switch_to_asm+0x34/0x70
[1378382.776340] ? __switch_to_asm+0x40/0x70
[1378382.779334] ? bit_wait_io_timeout+0x90/0x90
[1378382.782321] schedule+0x32/0x80
[1378382.785295] bit_wait+0xd/0x50
[1378382.788256] __wait_on_bit+0x55/0x80
[1378382.791216] ? __switch_to+0x175/0x450
[1378382.794160] ? bit_wait_io_timeout+0x90/0x90
[1378382.797118] __inode_wait_for_writeback+0xa7/0xe0
[1378382.800087] ? init_wait_var_entry+0x40/0x40
[1378382.803068] inode_wait_for_writeback+0x21/0x30
[1378382.806036] evict+0xab/0x1c0
[1378382.809011] taskq_thread+0x2d8/0x500 [spl]
[1378382.811978] ? wake_up_q+0x70/0x70
[1378382.814904] kthread+0xf8/0x130
[1378382.817779] ? taskq_thread_spawn+0x50/0x50 [spl]
[1378382.820652] ? kthread_create_worker_on_cpu+0x70/0x70
[1378382.823497] ret_from_fork+0x22/0x40
[1378382.826314] INFO: task auditd:3285 blocked for more than 120 seconds.
[1378382.829189] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.832123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.835086] auditd D 0 3285 1 0x00000000
[1378382.838049] Call Trace:
[1378382.841005] ? __schedule+0x3dc/0x860
[1378382.843965] schedule+0x32/0x80
[1378382.846931] cv_wait_common+0x115/0x130 [spl]
[1378382.849881] ? remove_wait_queue+0x60/0x60
[1378382.852994] zil_commit.part.13+0x86/0x8c0 [zfs]
[1378382.855987] ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378382.858989] ? tsd_set+0x2db/0x480 [spl]
[1378382.861953] ? _cond_resched+0x16/0x40
[1378382.865043] zfs_fsync+0x70/0xe0 [zfs]
[1378382.868107] zpl_fsync+0x63/0x90 [zfs]
[1378382.870989] do_fsync+0x38/0x60
[1378382.873795] __x64_sys_fsync+0x10/0x20
[1378382.876586] do_syscall_64+0x55/0x110
[1378382.879372] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378382.882154] RIP: 0033:0x7fb4709786ed
[1378382.884956] RSP: 002b:00007fb46e4f0e30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1378382.887829] RAX: ffffffffffffffda RBX: 0000559611942280 RCX: 00007fb4709786ed
[1378382.890722] RDX: 0000559611942280 RSI: 0000000000000000 RDI: 0000000000000004
[1378382.893611] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000f2e
[1378382.896514] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[1378382.899411] R13: 00007ffd5156da3f R14: 0000000000000000 R15: 00007fb4713f3040
[1378382.902349] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378382.905289] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.908295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.911381] HWRaidManager D 0 29978 1 0x20020080
[1378382.914432] Call Trace:
[1378382.917486] ? __schedule+0x3dc/0x860
[1378382.920538] ? default_file_splice_write+0x20/0x20
[1378382.923591] schedule+0x32/0x80
[1378382.926608] wb_wait_for_completion+0x5e/0x90
[1378382.929588] ? remove_wait_queue+0x60/0x60
[1378382.932575] sync_inodes_sb+0xa7/0x280
[1378382.935545] ? __queue_work+0x15e/0x410
[1378382.938482] ? default_file_splice_write+0x20/0x20
[1378382.941447] iterate_supers+0xb5/0x100
[1378382.944384] ksys_sync+0x40/0xb0
[1378382.947255] __ia32_sys_sync+0xa/0x10
[1378382.950060] do_fast_syscall_32+0x98/0x1e0
[1378382.952861] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[1378382.955677] INFO: task vlserver:18469 blocked for more than 120 seconds.
[1378382.958524] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.961416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.964385] vlserver D 0 18469 18467 0x00000080
[1378382.967376] Call Trace:
[1378382.970326] ? __schedule+0x3dc/0x860
[1378382.973312] schedule+0x32/0x80
[1378382.976310] cv_wait_common+0x115/0x130 [spl]
[1378382.979316] ? remove_wait_queue+0x60/0x60
[1378382.982421] zil_commit.part.13+0x86/0x8c0 [zfs]
[1378382.985437] ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378382.988453] ? tsd_set+0x2db/0x480 [spl]
[1378382.991448] ? _cond_resched+0x16/0x40
[1378382.994535] zfs_fsync+0x70/0xe0 [zfs]
[1378382.997638] zpl_fsync+0x63/0x90 [zfs]
[1378383.000610] do_fsync+0x38/0x60
[1378383.003581] __x64_sys_fsync+0x10/0x20
[1378383.006548] do_syscall_64+0x55/0x110
[1378383.009489] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378383.012460] RIP: 0033:0x7fdf15610560
[1378383.015399] RSP: 002b:00007ffddb903ad8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[1378383.018339] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fdf15610560
[1378383.021280] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[1378383.024203] RBP: 0000559976687a50 R08: 000000000000000c R09: 00007ffddb903a20
[1378383.027129] R10: 000055997532e220 R11: 0000000000000246 R12: 0000559976687a40
[1378383.030049] R13: 00007ffddb903cf8 R14: 00000000ffffff9c R15: 0000559976687a40
[1378503.587095] INFO: task z_iput:348 blocked for more than 120 seconds.
[1378503.590045] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.593155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.596215] z_iput D 0 348 2 0x80000000
[1378503.599285] Call Trace:
[1378503.602332] ? __schedule+0x3dc/0x860
[1378503.605388] ? __switch_to_asm+0x34/0x70
[1378503.608448] ? __switch_to_asm+0x40/0x70
[1378503.611499] ? bit_wait_io_timeout+0x90/0x90
[1378503.614549] schedule+0x32/0x80
[1378503.617605] bit_wait+0xd/0x50
[1378503.620655] __wait_on_bit+0x55/0x80
[1378503.623694] ? __switch_to+0x175/0x450
[1378503.626709] ? bit_wait_io_timeout+0x90/0x90
[1378503.629708] __inode_wait_for_writeback+0xa7/0xe0
[1378503.632690] ? init_wait_var_entry+0x40/0x40
[1378503.635654] inode_wait_for_writeback+0x21/0x30
[1378503.638607] evict+0xab/0x1c0
[1378503.641578] taskq_thread+0x2d8/0x500 [spl]
[1378503.644536] ? wake_up_q+0x70/0x70
[1378503.647460] kthread+0xf8/0x130
[1378503.650336] ? taskq_thread_spawn+0x50/0x50 [spl]
[1378503.653185] ? kthread_create_worker_on_cpu+0x70/0x70
[1378503.656003] ret_from_fork+0x22/0x40
[1378503.658817] INFO: task auditd:3285 blocked for more than 120 seconds.
[1378503.661676] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.664595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.667544] auditd D 0 3285 1 0x00000000
[1378503.670506] Call Trace:
[1378503.673447] ? __schedule+0x3dc/0x860
[1378503.676396] schedule+0x32/0x80
[1378503.679353] cv_wait_common+0x115/0x130 [spl]
[1378503.682306] ? remove_wait_queue+0x60/0x60
[1378503.685414] zil_commit.part.13+0x86/0x8c0 [zfs]
[1378503.688397] ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378503.691391] ? tsd_set+0x2db/0x480 [spl]
[1378503.694360] ? _cond_resched+0x16/0x40
[1378503.697439] zfs_fsync+0x70/0xe0 [zfs]
[1378503.700490] zpl_fsync+0x63/0x90 [zfs]
[1378503.703372] do_fsync+0x38/0x60
[1378503.706182] __x64_sys_fsync+0x10/0x20
[1378503.708964] do_syscall_64+0x55/0x110
[1378503.711734] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378503.714521] RIP: 0033:0x7fb4709786ed
[1378503.717325] RSP: 002b:00007fb46e4f0e30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1378503.720204] RAX: ffffffffffffffda RBX: 0000559611942280 RCX: 00007fb4709786ed
[1378503.723099] RDX: 0000559611942280 RSI: 0000000000000000 RDI: 0000000000000004
[1378503.725991] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000f2e
[1378503.728895] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[1378503.731792] R13: 00007ffd5156da3f R14: 0000000000000000 R15: 00007fb4713f3040
[1378503.734737] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378503.737681] Tainted: P O 4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.740688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.743728] HWRaidManager D 0 29978 1 0x20020080
[1378503.746782] Call Trace:
[1378503.749832] ? __schedule+0x3dc/0x860
[1378503.752880] ? default_file_splice_write+0x20/0x20
[1378503.755929] schedule+0x32/0x80
[1378503.758928] wb_wait_for_completion+0x5e/0x90
[1378503.761924] ? remove_wait_queue+0x60/0x60
[1378503.764908] sync_inodes_sb+0xa7/0x280
[1378503.767873] ? __queue_work+0x15e/0x410
[1378503.770812] ? default_file_splice_write+0x20/0x20
[1378503.773774] iterate_supers+0xb5/0x100
[1378503.776702] ksys_sync+0x40/0xb0
[1378503.779576] __ia32_sys_sync+0xa/0x10
[1378503.782427] do_fast_syscall_32+0x98/0x1e0
[1378503.785225] entry_SYSCALL_compat_after_hwframe+0x45/0x4d
rpool:
version: 5000
name: 'rpool'
state: 0
txg: 475211
pool_guid: 17441379133630177917
errata: 0
hostid: 715009
hostname: 'seraph'
com.delphix:has_per_vdev_zaps
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 17441379133630177917
children[0]:
type: 'raidz'
id: 0
guid: 11960073590447584870
nparity: 1
metaslab_array: 35
metaslab_shift: 37
ashift: 12
asize: 15998825529344
is_log: 0
create_txg: 4
com.delphix:vdev_zap_top: 40
children[0]:
type: 'disk'
id: 0
guid: 13773523373170097779
path: '/dev/disk/by-id/ata-HGST_HUS724040ALA640_PN2334PBJ3SV0T-part3'
whole_disk: 0
DTL: 65
create_txg: 4
com.delphix:vdev_zap_leaf: 41
children[1]:
type: 'disk'
id: 1
guid: 14927593852271164084
path: '/dev/disk/by-id/ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC1F0D1ZE3K-part3'
whole_disk: 0
DTL: 77
create_txg: 4
com.delphix:vdev_zap_leaf: 268
children[2]:
type: 'disk'
id: 2
guid: 4795217381968720306
path: '/dev/disk/by-id/ata-HGST_HUS724040ALA640_PN2334PBJ525GT-part3'
whole_disk: 0
DTL: 76
create_txg: 4
com.delphix:vdev_zap_leaf: 75
children[3]:
type: 'disk'
id: 3
guid: 14529900694222241193
path: '/dev/disk/by-id/ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D2U951-part3'
whole_disk: 0
DTL: 62
create_txg: 4
com.delphix:vdev_zap_leaf: 44
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 4h38m with 0 errors on Sun Sep 9 05:02:45 2018
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-HGST_HUS724040ALA640_PN2334PBJ3SV0T-part3 ONLINE 0 0 0
ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC1F0D1ZE3K-part3 ONLINE 0 0 0
sda3 ONLINE 0 0 0
ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D2U951-part3 ONLINE 0 0 0
spares
sdf3 AVAIL
errors: No known data errors