Skip to content

Commit 0dc5012

Browse files
arighijfvogel
authored andcommitted
sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
[ Upstream commit 1626e5e ] While performing the rq locking dance in dispatch_to_local_dsq(), we may trigger the following lock imbalance condition, in particular when multiple tasks are rapidly changing CPU affinity (i.e., running a `stress-ng --race-sched 0`): [ 13.413579] ===================================== [ 13.413660] WARNING: bad unlock balance detected! [ 13.413729] 6.13.0-virtme #15 Not tainted [ 13.413792] ------------------------------------- [ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at: [ 13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0 [ 13.414111] but there are no more locks to release! [ 13.414176] [ 13.414176] other info that might help us debug this: [ 13.414258] 1 lock held by kworker/1:1/80: [ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90 [ 13.414612] [ 13.414612] stack backtrace: [ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15 [ 13.415505] Workqueue: 0x0 (events) [ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms [ 13.415570] Call Trace: [ 13.415700] <TASK> [ 13.415744] dump_stack_lvl+0x78/0xe0 [ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.415884] print_unlock_imbalance_bug+0x11b/0x130 [ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.416226] lock_release+0x231/0x2c0 [ 13.416326] _raw_spin_unlock+0x1b/0x40 [ 13.416422] dispatch_to_local_dsq+0x108/0x1a0 [ 13.416554] flush_dispatch_buf+0x199/0x1d0 [ 13.416652] balance_one+0x194/0x370 [ 13.416751] balance_scx+0x61/0x1e0 [ 13.416848] prev_balance+0x43/0xb0 [ 13.416947] __pick_next_task+0x6b/0x1b0 [ 13.417052] __schedule+0x20d/0x1740 This happens because dispatch_to_local_dsq() is racing with dispatch_dequeue() and, when the latter wins, we incorrectly assume that the task has been moved to dst_rq. Fix by properly tracking the currently locked rq. Fixes: 4d3ca89 ("sched_ext: Refactor consume_remote_task()") Signed-off-by: Andrea Righi <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Sasha Levin <[email protected]> (cherry picked from commit 25ddd8f92a423e325c68acc6b318749ebc32306d) Signed-off-by: Jack Vogel <[email protected]>
1 parent 9025440 commit 0dc5012

File tree

1 file changed

+10
-4
lines changed

1 file changed

+10
-4
lines changed

kernel/sched/ext.c

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2463,6 +2463,9 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
24632463
{
24642464
struct rq *src_rq = task_rq(p);
24652465
struct rq *dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq);
2466+
#ifdef CONFIG_SMP
2467+
struct rq *locked_rq = rq;
2468+
#endif
24662469

24672470
/*
24682471
* We're synchronized against dequeue through DISPATCHING. As @p can't
@@ -2499,8 +2502,9 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
24992502
atomic_long_set_release(&p->scx.ops_state, SCX_OPSS_NONE);
25002503

25012504
/* switch to @src_rq lock */
2502-
if (rq != src_rq) {
2503-
raw_spin_rq_unlock(rq);
2505+
if (locked_rq != src_rq) {
2506+
raw_spin_rq_unlock(locked_rq);
2507+
locked_rq = src_rq;
25042508
raw_spin_rq_lock(src_rq);
25052509
}
25062510

@@ -2518,6 +2522,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
25182522
} else {
25192523
move_remote_task_to_local_dsq(p, enq_flags,
25202524
src_rq, dst_rq);
2525+
/* task has been moved to dst_rq, which is now locked */
2526+
locked_rq = dst_rq;
25212527
}
25222528

25232529
/* if the destination CPU is idle, wake it up */
@@ -2526,8 +2532,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
25262532
}
25272533

25282534
/* switch back to @rq lock */
2529-
if (rq != dst_rq) {
2530-
raw_spin_rq_unlock(dst_rq);
2535+
if (locked_rq != rq) {
2536+
raw_spin_rq_unlock(locked_rq);
25312537
raw_spin_rq_lock(rq);
25322538
}
25332539
#else /* CONFIG_SMP */

0 commit comments

Comments
 (0)