Extended MODVERSIONS Support #19

Adds a new format for MODVERSIONS which stores each field in a separate ELF section. This initially adds support for variable length names, but could later be used to add additional fields to MODVERSIONS in a backwards compatible way if needed. Any new fields will be ignored by old user tooling, unlike the current format where user tooling cannot tolerate adjustments to the format (for example making the name field longer). Since PPC munges its version records to strip leading dots, we reproduce the munging for the new format. Other architectures do not appear to have architecture-specific usage of this information. Signed-off-by: Matthew Maurer <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]>

Generate both the existing modversions format and the new extended one when running modpost. Presence of this metadata in the final .ko is guarded by CONFIG_EXTENDED_MODVERSIONS. We no longer generate an error on long symbols in modpost if CONFIG_EXTENDED_MODVERSIONS is set, as they can now be appropriately encoded in the extended section. These symbols will be skipped in the previous encoding. An error will still be generated if CONFIG_EXTENDED_MODVERSIONS is not set. Signed-off-by: Matthew Maurer <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]>

Previously, two things stopped Rust from using MODVERSIONS: 1. Rust symbols are occasionally too long to be represented in the original versions table 2. Rust types cannot be properly hashed by the existing genksyms approach because: * Looking up type definitions in Rust is more complex than C * Type layout is potentially dependent on the compiler in Rust, not just the source type declaration. CONFIG_EXTENDED_MODVERSIONS addresses the first point, and CONFIG_GENDWARFKSYMS the second. If Rust wants to use MODVERSIONS, allow it to do so by selecting both features. Signed-off-by: Sami Tolvanen <[email protected]> Co-developed-by: Matthew Maurer <[email protected]> Signed-off-by: Matthew Maurer <[email protected]>

modules-kpd-app · 2024-11-01T21:31:56Z

At least one diff in series https://patchwork.kernel.org/project/linux-modules/list/?series=904848 irrelevant now. Closing PR.

This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Srinivasan Shanmugam <[email protected]> Suggested-by: Alex Deucher <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]> (cherry picked from commit 0b6b2dd) Cc: [email protected]

modules-kpd-app bot added modules-next new V7 labels Oct 23, 2024

modules-kpd-app bot force-pushed the modules-next_base branch from 800bb0b to cc2d5e7 Compare October 28, 2024 16:36

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from 74bad2c to 3589437 Compare October 28, 2024 16:37

modules-kpd-app bot force-pushed the modules-next_base branch from cc2d5e7 to b493f88 Compare October 28, 2024 18:17

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from 3589437 to bc47690 Compare October 28, 2024 18:18

modules-kpd-app bot force-pushed the modules-next_base branch from b493f88 to aca7f90 Compare October 28, 2024 19:44

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from bc47690 to 4117d1c Compare October 28, 2024 19:45

modules-kpd-app bot force-pushed the modules-next_base branch from aca7f90 to db02ce9 Compare October 28, 2024 21:45

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from 4117d1c to ddbbc07 Compare October 28, 2024 21:46

modules-kpd-app bot force-pushed the modules-next_base branch from db02ce9 to 050f9cb Compare October 28, 2024 22:29

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from ddbbc07 to 20ce3cc Compare October 28, 2024 22:30

modules-kpd-app bot force-pushed the modules-next_base branch from 050f9cb to 90cf11e Compare October 28, 2024 23:50

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from 20ce3cc to 8406617 Compare October 28, 2024 23:51

modules-kpd-app bot force-pushed the modules-next_base branch from 90cf11e to 5071b4f Compare October 29, 2024 00:10

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from 8406617 to f4f66fb Compare October 29, 2024 00:11

adding ci files

18e1a8e

modules-kpd-app bot force-pushed the modules-next_base branch from 5071b4f to 18e1a8e Compare October 29, 2024 23:34

maurer added 2 commits October 29, 2024 23:35

modules-kpd-app bot force-pushed the series/892924=>modules-next branch from f4f66fb to 149cb59 Compare October 29, 2024 23:35

modules-kpd-app bot force-pushed the modules-next_base branch from 18e1a8e to f496cd0 Compare November 1, 2024 21:31

modules-kpd-app bot added handled-elsewhere V8 and removed new V7 labels Nov 1, 2024

modules-kpd-app bot closed this Nov 1, 2024

modules-kpd-app bot deleted the series/892924=>modules-next branch November 1, 2024 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extended MODVERSIONS Support #19

Extended MODVERSIONS Support #19

Uh oh!

modules-kpd-app bot commented Oct 23, 2024

Uh oh!

modules-kpd-app bot commented Oct 23, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 29, 2024

Uh oh!

modules-kpd-app bot commented Oct 29, 2024

Uh oh!

modules-kpd-app bot commented Nov 1, 2024

Uh oh!

Uh oh!

Extended MODVERSIONS Support #19

Extended MODVERSIONS Support #19

Uh oh!

Conversation

modules-kpd-app bot commented Oct 23, 2024

Uh oh!

modules-kpd-app bot commented Oct 23, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 28, 2024

Uh oh!

modules-kpd-app bot commented Oct 29, 2024

Uh oh!

modules-kpd-app bot commented Oct 29, 2024

Uh oh!

modules-kpd-app bot commented Nov 1, 2024

Uh oh!

Uh oh!