Skip to content

Commit 019041c

Browse files
sharathmsrinivijay-suman
authored andcommitted
net/rds: Always cancel heartbeat worker thread during conn destroy
rds_rdma testing often loads/unloads the module several times which leads to an RDS connection destroy not seen during production. A small window exists where a module unload (and connection destroy) can occur immediately after connection establishment, but before a heartbeat handshake completes, so the worker thread remains uncancelled after the connection is destroyed. This code change to cancel any pending worker threads is safe even when heartbeats are disabled via: sysctl net.rds.conn_heartbeat_timeout_secs=0 as there is no penalty to call cancel_delayed_work_sync() with no items in the delayed_work queue. [ 601.460085] general protection fault, probably for non-canonical address 0xffff20e8871f4d08: 0000 [#1] SMP PTI [ 601.471262] CPU: 15 PID: 0 Comm: swapper/15 Kdump: loaded Tainted: G S W 5.15.0-200.131.26.connreap.el8uek.v1.x86_64 #2 [ 601.484563] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30300200 07/10/2019 [ 601.495634] RIP: 0010:__queue_work+0xde/0x40a [ 601.500504] Code: 8b 37 40 f6 c6 04 75 cf 48 c1 ee 05 81 fe ff ff ff 7f 0f 84 99 00 00 00 48 c7 c7 50 0c c7 95 48 63 f6 e8 55 29 4f 00 48 89 c7 <48> 8b 03 48 85 ff 0f 84 c0 02 00 00 48 39 f8 74 79 48 89 7c 24 08 [ 601.521460] RSP: 0018:ffffb5474c8d4e78 EFLAGS: 00010046 [ 601.527294] RAX: ffff9093bfbf1500 RBX: ffff20e8871f4d08 RCX: 0000000000000000 [ 601.535264] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9093bfbf1500 [ 601.543231] RBP: 000000000000003f R08: 0000000000000000 R09: 0000000000000000 [ 601.551197] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000f [ 601.559167] R13: 000000000002e308 R14: ffff9054c7634c00 R15: ffff9054e5a8f208 [ 601.567136] FS: 0000000000000000(0000) GS:ffff9093bfbc0000(0000) knlGS:0000000000000000 [ 601.576168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 601.582585] CR2: 000055b92cebf000 CR3: 000000178a010003 CR4: 00000000001706e0 [ 601.590549] Call Trace: [ 601.593279] <IRQ> [ 601.595526] ? show_trace_log_lvl+0x1d6/0x2f9 [ 601.600394] ? show_trace_log_lvl+0x1d6/0x2f9 [ 601.605255] ? call_timer_fn+0x27/0xff [ 601.609441] ? __die_body.cold+0x8/0xa [ 601.613625] ? die_addr+0x39/0x53 [ 601.617327] ? exc_general_protection+0x1c4/0x3e9 [ 601.622583] ? asm_exc_general_protection+0x22/0x27 [ 601.628034] ? __queue_work+0xde/0x40a [ 601.632221] ? __queue_work+0xdb/0x40a [ 601.636398] ? queue_work_node+0x110/0x105 [ 601.640973] call_timer_fn+0x27/0xff [ 601.644973] __run_timers+0x1bd/0x299 [ 601.649064] run_timer_softirq+0x19/0x2d [ 601.653442] __do_softirq+0xd0/0x2a5 [ 601.657442] ? sched_clock_cpu+0x9/0xb6 [ 601.661730] __irq_exit_rcu+0xc7/0xf1 [ 601.665829] sysvec_apic_timer_interrupt+0x72/0x89 [ 601.671186] </IRQ> [ 601.673526] <TASK> [ 601.675867] asm_sysvec_apic_timer_interrupt+0x16/0x1b [ 601.681609] RIP: 0010:cpuidle_enter_state+0xc7/0x35d Orabug: 35954530 Fixes: fbf83fabd8fb ("net/rds: Quiesce heartbeat worker in rds_conn_path_destroy()") Tested-by: Jenny Xu <[email protected]> Signed-off-by: Sharath Srinivasan <[email protected]> Reviewed-by: Gerd Rausch <[email protected]> Reviewed-by: Håkon Bugge <[email protected]>
1 parent 6f757c9 commit 019041c

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

net/rds/connection.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -671,9 +671,8 @@ static void rds_conn_path_destroy(struct rds_conn_path *cp, int shutdown)
671671
return;
672672

673673
/* quiesce heartbeats */
674-
if (cp->cp_conn->c_is_hb_enabled)
675-
rds_queue_cancel_work(cp, &cp->cp_hb_w,
676-
"conn path destroy hb worker");
674+
rds_queue_cancel_work(cp, &cp->cp_hb_w,
675+
"conn path destroy hb worker");
677676

678677
/* make sure lingering queued work won't try to ref the
679678
* conn. If there is work queued, we cancel it (and set the

0 commit comments

Comments
 (0)