Skip to content

Commit 20880a3

Browse files
committed
drm/amdgpu: fix a job->pasid access race in gpu recovery
Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: a72002c ("drm/amdgpu: Make use of drm_wedge_task_info") Link: HansKristian-Work/vkd3d-proton#2670 Cc: [email protected] Cc: [email protected] Cc: [email protected] Suggested-by: Matthew Brost <[email protected]> Reviewed-by: Srinivasan Shanmugam <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
1 parent a1a445f commit 20880a3

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6620,6 +6620,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
66206620
struct amdgpu_hive_info *hive = NULL;
66216621
int r = 0;
66226622
bool need_emergency_restart = false;
6623+
/* save the pasid here as the job may be freed before the end of the reset */
6624+
int pasid = job ? job->pasid : -EINVAL;
66236625

66246626
/*
66256627
* If it reaches here because of hang/timeout and a RAS error is
@@ -6720,8 +6722,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
67206722
if (!r) {
67216723
struct amdgpu_task_info *ti = NULL;
67226724

6723-
if (job)
6724-
ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid);
6725+
/*
6726+
* The job may already be freed at this point via the sched tdr workqueue so
6727+
* use the cached pasid.
6728+
*/
6729+
if (pasid >= 0)
6730+
ti = amdgpu_vm_get_task_info_pasid(adev, pasid);
67256731

67266732
drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE,
67276733
ti ? &ti->task : NULL);

0 commit comments

Comments
 (0)