Skip to content

Commit 6475357

Browse files
jkrzyszt-inteljnikula
authored andcommitted
Revert "drm/i915: Remove extra multi-gt pm-references"
This reverts commit 1f33dc0. There was a patch supposed to fix an issue of illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle, reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit f56fe3e ("drm/i915: Fix a VMA UAF for multi-gt platform"). However, that fix occurred insufficient -- the issue was still reported by CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then potentially before completion of the request and deactivation of its associated VMAs. Moreover, CI reports indicated that single-GT platforms also suffered sporadically from the same race. Since that issue was fixed by another commit f3c71b2 ("drm/i915/vma: Fix UAF on destroy against retire race"), the changes introduced by that insufficient fix were dropped as no longer useful. However, that series resulted in another VMA UAF scenario now being triggered in CI. <4> [260.290809] ------------[ cut here ]------------ <4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510) <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0 .. <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0 ... <4> [260.291087] Call Trace: <4> [260.291089] <TASK> <4> [260.291124] i915_vma_reopen+0x43/0x80 [i915] <4> [260.291298] eb_lookup_vmas+0x9cb/0xcc0 [i915] <4> [260.291579] i915_gem_do_execbuffer+0xc9a/0x26d0 [i915] <4> [260.291883] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [260.292301] </TASK> ... <4> [260.292506] ---[ end trace 0000000000000000 ]--- <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G W 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915] ... <4> [260.428756] Call Trace: <4> [260.431192] <TASK> <4> [639.283393] i915_gem_do_execbuffer+0xd05/0x26d0 [i915] <4> [639.305245] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [639.411134] </TASK> ... <4> [639.449979] ---[ end trace 0000000000000000 ]--- We defer actually closing, unbinding and destroying a VMA until next idle point, or until the object is freed in the meantime. By postponing the unbind, we allow for the VMA to be reopened by the client, avoiding the work required to rebind the VMA. Starting from commit b0647a5 ("drm/i915: Avoid live-lock with i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA would be reopened while we destroy them. That assumption is no longer true in multi-GT configurations, where a VMA we reopen may be handled by a GT different from the one that we already keep active via its engine while we set up an execbuf request. Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer() processing path seems to fix this issue. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608 Signed-off-by: Janusz Krzysztofik <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Nirmoy Das <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Fixes: 1f33dc0 ("drm/i915: Remove extra multi-gt pm-references") Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit 749670a) Signed-off-by: Jani Nikula <[email protected]>
1 parent 1613e60 commit 6475357

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,7 @@ struct i915_execbuffer {
255255
struct intel_context *context; /* logical state for the request */
256256
struct i915_gem_context *gem_context; /** caller's context */
257257
intel_wakeref_t wakeref;
258+
intel_wakeref_t wakeref_gt0;
258259

259260
/** our requests to build */
260261
struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
@@ -2685,6 +2686,7 @@ static int
26852686
eb_select_engine(struct i915_execbuffer *eb)
26862687
{
26872688
struct intel_context *ce, *child;
2689+
struct intel_gt *gt;
26882690
unsigned int idx;
26892691
int err;
26902692

@@ -2708,10 +2710,17 @@ eb_select_engine(struct i915_execbuffer *eb)
27082710
}
27092711
}
27102712
eb->num_batches = ce->parallel.number_children + 1;
2713+
gt = ce->engine->gt;
27112714

27122715
for_each_child(ce, child)
27132716
intel_context_get(child);
27142717
eb->wakeref = intel_gt_pm_get(ce->engine->gt);
2718+
/*
2719+
* Keep GT0 active on MTL so that i915_vma_parked() doesn't
2720+
* free VMAs while execbuf ioctl is validating VMAs.
2721+
*/
2722+
if (gt->info.id)
2723+
eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
27152724

27162725
if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
27172726
err = intel_context_alloc_state(ce);
@@ -2750,6 +2759,9 @@ eb_select_engine(struct i915_execbuffer *eb)
27502759
return err;
27512760

27522761
err:
2762+
if (gt->info.id)
2763+
intel_gt_pm_put(to_gt(gt->i915), eb->wakeref_gt0);
2764+
27532765
intel_gt_pm_put(ce->engine->gt, eb->wakeref);
27542766
for_each_child(ce, child)
27552767
intel_context_put(child);
@@ -2763,6 +2775,12 @@ eb_put_engine(struct i915_execbuffer *eb)
27632775
struct intel_context *child;
27642776

27652777
i915_vm_put(eb->context->vm);
2778+
/*
2779+
* This works in conjunction with eb_select_engine() to prevent
2780+
* i915_vma_parked() from interfering while execbuf validates vmas.
2781+
*/
2782+
if (eb->gt->info.id)
2783+
intel_gt_pm_put(to_gt(eb->gt->i915), eb->wakeref_gt0);
27662784
intel_gt_pm_put(eb->context->engine->gt, eb->wakeref);
27672785
for_each_child(eb->context, child)
27682786
intel_context_put(child);

0 commit comments

Comments
 (0)