Skip to content

Commit 2f6d5b5

Browse files
hfreudegregkh
authored andcommitted
s390/ap: Fix deadlock caused by recursive lock of the AP bus scan mutex
[ Upstream commit 56199bb ] There is a possibility to deadlock with an recursive lock of the AP bus scan mutex ap_scan_bus_mutex: ... kernel: ============================================ ... kernel: WARNING: possible recursive locking detected ... kernel: 5.14.0-496.el9.s390x #3 Not tainted ... kernel: -------------------------------------------- ... kernel: kworker/12:1/130 is trying to acquire lock: ... kernel: 0000000358bc1510 (ap_scan_bus_mutex){+.+.}-{3:3}, at: ap_bus_force_rescan+0x92/0x108 ... kernel: but task is already holding lock: ... kernel: 0000000358bc1510 (ap_scan_bus_mutex){+.+.}-{3:3}, at: ap_scan_bus_wq_callback+0x28/0x60 ... kernel: other info that might help us debug this: ... kernel: Possible unsafe locking scenario: ... kernel: CPU0 ... kernel: ---- ... kernel: lock(ap_scan_bus_mutex); ... kernel: lock(ap_scan_bus_mutex); ... kernel: *** DEADLOCK *** Here is how the callstack looks like: ... [<00000003576fe9ce>] process_one_work+0x2a6/0x748 ... [<0000000358150c00>] ap_scan_bus_wq_callback+0x40/0x60 <- mutex locked ... [<00000003581506e2>] ap_scan_bus+0x5a/0x3b0 ... [<000000035815037c>] ap_scan_adapter+0x5b4/0x8c0 ... [<000000035814fa34>] ap_scan_domains+0x2d4/0x668 ... [<0000000357d989b4>] device_add+0x4a4/0x6b8 ... [<0000000357d9bb54>] bus_probe_device+0xb4/0xc8 ... [<0000000357d9daa8>] __device_attach+0x120/0x1b0 ... [<0000000357d9a632>] bus_for_each_drv+0x8a/0xd0 ... [<0000000357d9d548>] __device_attach_driver+0xc0/0x140 ... [<0000000357d9d3d8>] driver_probe_device+0x40/0xf0 ... [<0000000357d9cec2>] really_probe+0xd2/0x460 ... [<000000035814d7b0>] ap_device_probe+0x150/0x208 ... [<000003ff802a5c46>] zcrypt_cex4_queue_probe+0xb6/0x1c0 [zcrypt_cex4] ... [<000003ff7fb2d36e>] zcrypt_queue_register+0xe6/0x1b0 [zcrypt] ... [<000003ff7fb2c8ac>] zcrypt_rng_device_add+0x94/0xd8 [zcrypt] ... [<0000000357d7bc52>] hwrng_register+0x212/0x228 ... [<0000000357d7b8c2>] add_early_randomness+0x102/0x110 ... [<000003ff7fb29c94>] zcrypt_rng_data_read+0x94/0xb8 [zcrypt] ... [<0000000358150aca>] ap_bus_force_rescan+0x92/0x108 ... [<0000000358177572>] mutex_lock_interruptible_nested+0x32/0x40 <- lock again Note this only happens when the very first random data providing crypto card appears via hot plug in the system AND is in disabled state ("deconfig"). Then the initial pull of random data fails and a re-scan of the AP bus is triggered while already in the middle of an AP bus scan caused by the appearing new hardware. The fix is relatively simple once the scenario us understood: The AP bus force rescan function will immediately return if there is currently an AP bus scan running with the very same thread id. Fixes: eacf5b3 ("s390/ap: introduce mutex to lock the AP bus scan") Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
1 parent 3537a47 commit 2f6d5b5

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

drivers/s390/crypto/ap_bus.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ debug_info_t *ap_dbf_info;
107107
static bool ap_scan_bus(void);
108108
static bool ap_scan_bus_result; /* result of last ap_scan_bus() */
109109
static DEFINE_MUTEX(ap_scan_bus_mutex); /* mutex ap_scan_bus() invocations */
110+
static struct task_struct *ap_scan_bus_task; /* thread holding the scan mutex */
110111
static atomic64_t ap_scan_bus_count; /* counter ap_scan_bus() invocations */
111112
static int ap_scan_bus_time = AP_CONFIG_TIME;
112113
static struct timer_list ap_scan_bus_timer;
@@ -1006,11 +1007,25 @@ bool ap_bus_force_rescan(void)
10061007
if (scan_counter <= 0)
10071008
goto out;
10081009

1010+
/*
1011+
* There is one unlikely but nevertheless valid scenario where the
1012+
* thread holding the mutex may try to send some crypto load but
1013+
* all cards are offline so a rescan is triggered which causes
1014+
* a recursive call of ap_bus_force_rescan(). A simple return if
1015+
* the mutex is already locked by this thread solves this.
1016+
*/
1017+
if (mutex_is_locked(&ap_scan_bus_mutex)) {
1018+
if (ap_scan_bus_task == current)
1019+
goto out;
1020+
}
1021+
10091022
/* Try to acquire the AP scan bus mutex */
10101023
if (mutex_trylock(&ap_scan_bus_mutex)) {
10111024
/* mutex acquired, run the AP bus scan */
1025+
ap_scan_bus_task = current;
10121026
ap_scan_bus_result = ap_scan_bus();
10131027
rc = ap_scan_bus_result;
1028+
ap_scan_bus_task = NULL;
10141029
mutex_unlock(&ap_scan_bus_mutex);
10151030
goto out;
10161031
}
@@ -2284,7 +2299,9 @@ static void ap_scan_bus_wq_callback(struct work_struct *unused)
22842299
* system_long_wq which invokes this function here again.
22852300
*/
22862301
if (mutex_trylock(&ap_scan_bus_mutex)) {
2302+
ap_scan_bus_task = current;
22872303
ap_scan_bus_result = ap_scan_bus();
2304+
ap_scan_bus_task = NULL;
22882305
mutex_unlock(&ap_scan_bus_mutex);
22892306
}
22902307
}

0 commit comments

Comments
 (0)