-
Notifications
You must be signed in to change notification settings - Fork 919
adi_axi_hdmi: reduce adv7511 private API dependancy and make it work with mainline #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adi_axi_hdmi: reduce adv7511 private API dependancy and make it work with mainline #13
Conversation
Whenever the adv7511 driver refreshes EDID, it does update it also in the DRM layer. We can access EDID from connector struct. Signed-off-by: Andrea Merello <[email protected]>
Signed-off-by: Andrea Merello <[email protected]>
Users of this function have been changed not to need it, and it should be useless now Signed-off-by: Andrea Merello <[email protected]>
This seems not implemented by the adv7511 mainline driver Signed-off-by: Andrea Merello <[email protected]>
696c222 to
ee8a888
Compare
|
Thanks again, looks good. Can you send a follow up patch that adds a helper function drm_connector_get_edid(), accessing the internal data structures like this is not so nice. Maybe also update the tegra driver to use that helper function. The main reason why I haven't send the driver upstream is because it uses the DMAEngine API in a non-standard way. What we want is a cyclic interleaved transfer. What the DMAEngine API offers is either cyclic transfers or interleaved transfers, but not cyclic interleaved transfers. Always wanted to fix that but hadn't had the chance yet. The other thing is all new DRM drivers for upstream now should use the atomic modeset API and this driver is still using the old API. There are two nice presentations on the new API and how to do the conversion [1][2]. If you want to give it a shot trying to do the conversion that would be very much welcome :)
[1] http://events.linuxfoundation.org/sites/events/files/slides/20151005-elce.pdf, https://www.youtube.com/watch?v=5uHMpjz68HE |
There is a bunch of drivers that directly access edid_blob_ptr in drm_connector_structure. This requires a cast and possibly a NULL ptr check. Wrap this in a common helper. Signed-off-by: Andrea Merello <[email protected]>
…ss EDID data Signed-off-by: Andrea Merello <[email protected]>
|
I've introduced the helper function for accessing EDID. As you said, it might worth to update Tegra, as well as a few other users, altought I haven't do that yet.. I also agree with you about the dma API being not good, and I would really like it to be fixed; however given that the VDMA has been actually already mainlined, I thought that this whole API issue shouldn't be blocking for mainlining adi_axi_hdmi (indeed if we are not allowed to mainline clients for the VDMA driver, then there was no point in mainlining the VDMA driver itslef). Maybe things can be fixed later (hopefully in one single shot). I will give a look to the atomic modeset thing. Surely it will be queued on my TODO list.. The only issue is that it is quite a long list :( |
Fixes segmentation fault using, for instance: (gdb) run record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Starting program: /home/acme/bin/perf record -I -e intel_pt/tsc=1,noretcomp=1/u /bin/ls Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-7.fc23.x86_64 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0 x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 (gdb) bt #0 0x00000000004b9ea5 in tracepoint_error (e=0x0, err=13, sys=0x19b1370 "sched", name=0x19a5d00 "sched_switch") at util/parse-events.c:410 #1 0x00000000004b9fc5 in add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:433 #2 0x00000000004ba334 in add_tracepoint_event (list=0x19a5d20, idx=0x7fffffffb8c0, sys_name=0x19b1370 "sched", evt_name=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:498 #3 0x00000000004bb699 in parse_events_add_tracepoint (list=0x19a5d20, idx=0x7fffffffb8c0, sys=0x19b1370 "sched", event=0x19a5d00 "sched_switch", err=0x0, head_config=0x0) at util/parse-events.c:936 #4 0x00000000004f6eda in parse_events_parse (_data=0x7fffffffb8b0, scanner=0x19a49d0) at util/parse-events.y:391 #5 0x00000000004bc8e5 in parse_events__scanner (str=0x663ff2 "sched:sched_switch", data=0x7fffffffb8b0, start_token=258) at util/parse-events.c:1361 #6 0x00000000004bca57 in parse_events (evlist=0x19a5220, str=0x663ff2 "sched:sched_switch", err=0x0) at util/parse-events.c:1401 #7 0x0000000000518d5f in perf_evlist__can_select_event (evlist=0x19a3b90, str=0x663ff2 "sched:sched_switch") at util/record.c:253 #8 0x0000000000553c42 in intel_pt_track_switches (evlist=0x19a3b90) at arch/x86/util/intel-pt.c:364 #9 0x00000000005549d1 in intel_pt_recording_options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at arch/x86/util/intel-pt.c:664 #10 0x000000000051e076 in auxtrace_record__options (itr=0x19a2c40, evlist=0x19a3b90, opts=0x8edf68 <record+232>) at util/auxtrace.c:539 #11 0x0000000000433368 in cmd_record (argc=1, argv=0x7fffffffde60, prefix=0x0) at builtin-record.c:1264 #12 0x000000000049bec2 in run_builtin (p=0x8fa2a8 <commands+168>, argc=5, argv=0x7fffffffde60) at perf.c:390 #13 0x000000000049c12a in handle_internal_command (argc=5, argv=0x7fffffffde60) at perf.c:451 #14 0x000000000049c278 in run_argv (argcp=0x7fffffffdcbc, argv=0x7fffffffdcb0) at perf.c:495 #15 0x000000000049c60a in main (argc=5, argv=0x7fffffffde60) at perf.c:618 (gdb) Intel PT attempts to find the sched:sched_switch tracepoint but that seg faults if tracefs is not readable, because the error reporting structure is null, as errors are not reported when automatically adding tracepoints. Fix by checking before using. Committer note: This doesn't take place in a kernel that supports perf_event_attr.context_switch, that is the default way that will be used for tracking context switches, only in older kernels, like 4.2, in a machine with Intel PT (e.g. Broadwell) for non-priviledged users. Further info from a similar patch by Wang: The error is in tracepoint_error: it assumes the 'e' parameter is valid. However, there are many situation a parse_event() can be called without parse_events_error. See result of $ grep 'parse_events(.*NULL)' ./tools/perf/ -r' Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Tong Zhang <[email protected]> Cc: Wang Nan <[email protected]> Cc: [email protected] # v4.4+ Fixes: 1965817 ("perf tools: Enhance parsing events tracepoint error output") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
The syzkaller detected a out-of-bounds issue with the events filter code, specifically here: prog[N].pred = NULL; /* #13 */ prog[N].target = 1; /* TRUE */ prog[N+1].pred = NULL; prog[N+1].target = 0; /* FALSE */ -> prog[N-1].target = N; prog[N-1].when_to_branch = false; As that's the first reference to a "N-1" index, it appears that the code got here with N = 0, which means the filter parser found no filter to parse (which shouldn't ever happen, but apparently it did). Add a new error to the parsing code that will check to make sure that N is not zero before going into this part of the code. If N = 0, then -EINVAL is returned, and a error message is added to the filter. Cc: [email protected] Fixes: 8076559 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: air icy <[email protected]> bugzilla url: https://bugzilla.kernel.org/show_bug.cgi?id=200019 Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Booting a ColdFire m68k core with MMU enabled causes a "bad page state" oops since commit 1d40a5e ("mm: mark pages in use for page tables"): BUG: Bad page state in process sh pfn:01ce2 page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0 flags: 0x0() raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000 raw: 039c4000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13 Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path, so that the PG_table flag is cleared before we free the pte page. Note that I had to change the type of pte_free() to be static from extern. Otherwise you get a lot of warnings like this: ./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static pgtable_page_dtor(page); ^ And making it static is consistent with our use of this in the other m68k pgalloc definitions of pte_free(). Signed-off-by: Greg Ungerer <[email protected]> CC: Matthew Wilcox <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]>
Crash dump shows following instructions crash> bt PID: 0 TASK: ffffffffbe412480 CPU: 0 COMMAND: "swapper/0" #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a #4 [ffff891ee00039e0] no_context at ffffffffbd074643 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925 [exception RIP: qlt_schedule_sess_for_deletion+15] RIP: ffffffffc02e526f RSP: ffff891ee0003c08 RFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffc0307847 RDX: 00000000000020e6 RSI: ffff891edbc377c8 RDI: 0000000000000000 RBP: ffff891ee0003c18 R8: ffffffffc02f0b20 R9: 0000000000000250 R10: 0000000000000258 R11: 000000000000b780 R12: ffff891ed9b43000 R13: 00000000000000f0 R14: 0000000000000006 R15: ffff891edbc377c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx] #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx] #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx] #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx] #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02 #16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90 #17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984 #18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5 #19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18 --- <IRQ stack> --- #20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e [exception RIP: unknown or invalid address] RIP: 000000000000001f RSP: 0000000000000000 RFLAGS: fff3b8c2091ebb3f RAX: ffffbba5a0000200 RBX: 0000be8cdfa8f9fa RCX: 0000000000000018 RDX: 0000000000000101 RSI: 000000000000015d RDI: 0000000000000193 RBP: 0000000000000083 R8: ffffffffbe403e38 R9: 0000000000000002 R10: 0000000000000000 R11: ffffffffbe56b820 R12: ffff891ee001cf00 R13: ffffffffbd11c0a4 R14: ffffffffbe403d60 R15: 0000000000000001 ORIG_RAX: ffff891ee0022ac0 CS: 0000 SS: ffffffffffffffb9 bt: WARNING: possibly bogus exception frame #21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd #22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907 #23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3 #24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42 #25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3 #26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa #27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca #28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675 #29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb #30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5 Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login") Cc: <[email protected]> # v4.17+ Signed-off-by: Chuck Anderson <[email protected]> Signed-off-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
sykzaller triggered several panics similar to the below: [...] [ 248.851531] BUG: KASAN: use-after-free in _copy_to_user+0x5c/0x90 [ 248.857656] Read of size 985 at addr ffff8808017ffff2 by task a.out/1425 [...] [ 248.865902] CPU: 1 PID: 1425 Comm: a.out Not tainted 4.18.0-rc4+ #13 [ 248.865903] Hardware name: Supermicro SYS-5039MS-H12TRF/X11SSE-F, BIOS 2.1a 03/08/2018 [ 248.865905] Call Trace: [ 248.865910] dump_stack+0xd6/0x185 [ 248.865911] ? show_regs_print_info+0xb/0xb [ 248.865913] ? printk+0x9c/0xc3 [ 248.865915] ? kmsg_dump_rewind_nolock+0xe4/0xe4 [ 248.865919] print_address_description+0x6f/0x270 [ 248.865920] kasan_report+0x25b/0x380 [ 248.865922] ? _copy_to_user+0x5c/0x90 [ 248.865924] check_memory_region+0x137/0x190 [ 248.865925] kasan_check_read+0x11/0x20 [ 248.865927] _copy_to_user+0x5c/0x90 [ 248.865930] bpf_test_finish.isra.8+0x4f/0xc0 [ 248.865932] bpf_prog_test_run_skb+0x6a0/0xba0 [...] After scrubbing the BPF prog a bit from the noise, turns out it called bpf_skb_change_head() for the lwt_xmit prog with headroom of 2. Nothing wrong in that, however, this was run with repeat >> 0 in bpf_prog_test_run_skb() and the same skb thus keeps changing until the pskb_expand_head() called from skb_cow() keeps bailing out in atomic alloc context with -ENOMEM. So upon return we'll basically have 0 headroom left yet blindly do the __skb_push() of 14 bytes and keep copying data from there in bpf_test_finish() out of bounds. Fix to check if we have enough headroom and if pskb_expand_head() fails, bail out with error. Another bug independent of this fix (but related in triggering above) is that BPF_PROG_TEST_RUN should be reworked to reset the skb/xdp buffer to it's original state from input as otherwise repeating the same test in a loop won't work for benchmarking when underlying input buffer is getting changed by the prog each time and reused for the next run leading to unexpected results. Fixes: 1cf1cae ("bpf: introduce BPF_PROG_TEST_RUN command") Reported-by: [email protected] Reported-by: [email protected] Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
rdma_counter_init() may fail for a device. In such case while calculating total sum, ignore NULL hstats. This fixes below observed call trace. BUG: kernel NULL pointer dereference, address: 00000000000000a0 PGD 8000001009b30067 P4D 8000001009b30067 PUD 10549c9067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 55 PID: 20887 Comm: cat Kdump: loaded Not tainted 5.2.0-rc6-jdc+ #13 RIP: 0010:rdma_counter_get_hwstat_value+0xf2/0x150 [ib_core] Call Trace: show_hw_stats+0x5e/0x130 [ib_core] dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xc6/0x1a0 seq_read+0x132/0x370 vfs_read+0x89/0x140 ksys_read+0x5c/0xd0 do_syscall_64+0x5a/0x240 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: f34a55e ("RDMA/core: Get sum value of all counters when perform a sysfs stat read") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Parav Pandit <[email protected]> Reviewed-by: Mark Zhang <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
A deadlock with this stacktrace was observed. The loop thread does a GFP_KERNEL allocation, it calls into dm-bufio shrinker and the shrinker depends on I/O completion in the dm-bufio subsystem. In order to fix the deadlock (and other similar ones), we set the flag PF_MEMALLOC_NOIO at loop thread entry. PID: 474 TASK: ffff8813e11f4600 CPU: 10 COMMAND: "kswapd0" #0 [ffff8813dedfb938] __schedule at ffffffff8173f405 #1 [ffff8813dedfb990] schedule at ffffffff8173fa27 #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186 #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8 #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81 #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio] #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio] #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio] #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778 #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f #13 [ffff8813dedfbec0] kthread at ffffffff810a8428 #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242 PID: 14127 TASK: ffff881455749c00 CPU: 11 COMMAND: "loop1" #0 [ffff88272f5af228] __schedule at ffffffff8173f405 #1 [ffff88272f5af280] schedule at ffffffff8173fa27 #2 [ffff88272f5af2a0] schedule_preempt_disabled at ffffffff8173fd5e #3 [ffff88272f5af2b0] __mutex_lock_slowpath at ffffffff81741fb5 #4 [ffff88272f5af330] mutex_lock at ffffffff81742133 #5 [ffff88272f5af350] dm_bufio_shrink_count at ffffffffa03865f9 [dm_bufio] #6 [ffff88272f5af380] shrink_slab at ffffffff811a86bd #7 [ffff88272f5af470] shrink_zone at ffffffff811ad778 #8 [ffff88272f5af500] do_try_to_free_pages at ffffffff811adb34 #9 [ffff88272f5af590] try_to_free_pages at ffffffff811adef8 #10 [ffff88272f5af610] __alloc_pages_nodemask at ffffffff811a09c3 #11 [ffff88272f5af710] alloc_pages_current at ffffffff811e8b71 #12 [ffff88272f5af760] new_slab at ffffffff811f4523 #13 [ffff88272f5af7b0] __slab_alloc at ffffffff8173a1b5 #14 [ffff88272f5af880] kmem_cache_alloc at ffffffff811f484b #15 [ffff88272f5af8d0] do_blockdev_direct_IO at ffffffff812535b3 #16 [ffff88272f5afb00] __blockdev_direct_IO at ffffffff81255dc3 #17 [ffff88272f5afb30] xfs_vm_direct_IO at ffffffffa01fe3fc [xfs] #18 [ffff88272f5afb90] generic_file_read_iter at ffffffff81198994 #19 [ffff88272f5afc50] __dta_xfs_file_read_iter_2398 at ffffffffa020c970 [xfs] #20 [ffff88272f5afcc0] lo_rw_aio at ffffffffa0377042 [loop] #21 [ffff88272f5afd70] loop_queue_work at ffffffffa0377c3b [loop] #22 [ffff88272f5afe60] kthread_worker_fn at ffffffff810a8a0c #23 [ffff88272f5afec0] kthread at ffffffff810a8428 #24 [ffff88272f5aff50] ret_from_fork at ffffffff81745242 Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
Once implicit MR is being called to be released by ib_umem_notifier_release() its leaves were marked as "dying". However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is called, it skips running the mr_leaf_free_action (i.e. umem_odp->work) when those leaves were marked as "dying". As such ib_umem_release() for the leaves won't be called and their MRs will be leaked as well. When an application exits/killed without calling dereg_mr we might hit the above flow. This fatal scenario is reported by WARN_ON() upon mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the call trace can be seen below. Originally the "dying" mark as part of ib_umem_notifier_release() was introduced to prevent pagefault_mr() from returning a success response once this happened. However, we already have today the completion mechanism so no need for that in those flows any more. Even in case a success response will be returned the firmware will not find the pages and an error will be returned in the following call as a released mm will cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero(). Fix the above issue by dropping the "dying" from the above flows. The other flows that are using "dying" are still needed it for their synchronization purposes. WARNING: CPU: 1 PID: 7218 at drivers/infiniband/hw/mlx5/main.c:2004 mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib] CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G E 5.2.0-rc6+ #13 Call Trace: uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs] ib_uverbs_close+0x1f/0x80 [ib_uverbs] __fput+0xbe/0x250 task_work_run+0x88/0xa0 do_exit+0x2cb/0xc30 ? __fput+0x14b/0x250 do_group_exit+0x39/0xb0 get_signal+0x191/0x920 ? _raw_spin_unlock_bh+0xa/0x20 ? inet_csk_accept+0x229/0x2f0 do_signal+0x36/0x5e0 ? put_unused_fd+0x5b/0x70 ? __sys_accept4+0x1a6/0x1e0 ? inet_hash+0x35/0x40 ? release_sock+0x43/0x90 ? _raw_spin_unlock_bh+0xa/0x20 ? inet_listen+0x9f/0x120 exit_to_usermode_loop+0x5c/0xc6 do_syscall_64+0x182/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 81713d3 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yishai Hadas <[email protected]> Reviewed-by: Artemy Kovalyov <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
Right now, igc_ptp_reset() is called from igc_reset(), which is called from igc_probe() before igc_ptp_init() has a chance to run. It is detected as an attempt to use an spinlock without registering its key first. See log below. To avoid this problem, simplify the initialization: igc_ptp_init() is only called from igc_probe(), and igc_ptp_reset() is only called from igc_reset(). [ 2.736332] INFO: trying to register non-static key. [ 2.736902] input: HDA Intel PCH Front Headphone as /devices/pci0000:00/0000:00:1f.3/sound/card0/input10 [ 2.737513] the code is fine but needs lockdep annotation. [ 2.737513] turning off the locking correctness validator. [ 2.737515] CPU: 8 PID: 239 Comm: systemd-udevd Tainted: G E 5.8.0-rc7+ #13 [ 2.737515] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019 [ 2.737516] Call Trace: [ 2.737521] dump_stack+0x78/0xa0 [ 2.737524] register_lock_class+0x6b1/0x6f0 [ 2.737526] ? lockdep_hardirqs_on_prepare+0xca/0x160 [ 2.739177] ? _raw_spin_unlock_irq+0x24/0x50 [ 2.739179] ? trace_hardirqs_on+0x1c/0xf0 [ 2.740820] __lock_acquire+0x56/0x1ff0 [ 2.740823] ? __schedule+0x30c/0x970 [ 2.740825] lock_acquire+0x97/0x3e0 [ 2.740830] ? igc_ptp_reset+0x35/0xf0 [igc] [ 2.740833] ? schedule_hrtimeout_range_clock+0xb7/0x120 [ 2.742507] _raw_spin_lock_irqsave+0x3a/0x50 [ 2.742512] ? igc_ptp_reset+0x35/0xf0 [igc] [ 2.742515] igc_ptp_reset+0x35/0xf0 [igc] [ 2.742519] igc_reset+0x96/0xd0 [igc] [ 2.744148] igc_probe+0x68f/0x7d0 [igc] [ 2.745796] local_pci_probe+0x3d/0x70 [ 2.745799] pci_device_probe+0xd1/0x190 [ 2.745802] really_probe+0x15a/0x3f0 [ 2.759936] driver_probe_device+0xe1/0x150 [ 2.759937] device_driver_attach+0xa8/0xb0 [ 2.761786] __driver_attach+0x89/0x150 [ 2.761786] ? device_driver_attach+0xb0/0xb0 [ 2.761787] ? device_driver_attach+0xb0/0xb0 [ 2.761788] bus_for_each_dev+0x66/0x90 [ 2.765012] bus_add_driver+0x12e/0x1f0 [ 2.765716] driver_register+0x8b/0xe0 [ 2.766418] ? 0xffffffffc0230000 [ 2.767119] do_one_initcall+0x5a/0x310 [ 2.767826] ? kmem_cache_alloc_trace+0xe9/0x200 [ 2.768528] do_init_module+0x5c/0x260 [ 2.769206] __do_sys_finit_module+0x93/0xe0 [ 2.770048] do_syscall_64+0x46/0xa0 [ 2.770716] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2.771396] RIP: 0033:0x7f83534589e0 [ 2.772073] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 2e 2e 2e 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 80 24 0d 00 f7 d8 64 89 01 48 [ 2.772074] RSP: 002b:00007ffd31d0ed18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 2.774854] RAX: ffffffffffffffda RBX: 000055d52816aba0 RCX: 00007f83534589e0 [ 2.774855] RDX: 0000000000000000 RSI: 00007f83535b982f RDI: 0000000000000006 [ 2.774855] RBP: 00007ffd31d0ed60 R08: 0000000000000000 R09: 00007ffd31d0ed30 [ 2.774856] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000 [ 2.774856] R13: 0000000000020000 R14: 00007f83535b982f R15: 000055d527f5e120 Fixes: 5f29580 ("igc: Add basic skeleton for PTP") Signed-off-by: Vinicius Costa Gomes <[email protected]> Reviewed-by: Andre Guedes <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
After commit 92cc68e ("drm/vblank: Use spin_(un)lock_irq() in drm_crtc_vblank_on()") omapdrm locking is broken: WARNING: inconsistent lock state 5.8.0-rc2-00483-g92cc68e35863 #13 Tainted: G W -------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes: ea98222c (&dev->event_lock#2){?.+.}-{2:2}, at: drm_handle_vblank+0x4c/0x520 [drm] {HARDIRQ-ON-W} state was registered at: trace_hardirqs_on+0x9c/0x1ec _raw_spin_unlock_irq+0x20/0x58 omap_crtc_atomic_enable+0x54/0xa0 [omapdrm] drm_atomic_helper_commit_modeset_enables+0x218/0x270 [drm_kms_helper] omap_atomic_commit_tail+0x48/0xc4 [omapdrm] commit_tail+0x9c/0x190 [drm_kms_helper] drm_atomic_helper_commit+0x154/0x188 [drm_kms_helper] drm_client_modeset_commit_atomic+0x228/0x268 [drm] drm_client_modeset_commit_locked+0x60/0x1d0 [drm] drm_client_modeset_commit+0x24/0x40 [drm] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xa8 [drm_kms_helper] drm_fb_helper_set_par+0x2c/0x5c [drm_kms_helper] drm_fb_helper_hotplug_event.part.0+0xa0/0xbc [drm_kms_helper] drm_kms_helper_hotplug_event+0x24/0x30 [drm_kms_helper] output_poll_execute+0x1a8/0x1c0 [drm_kms_helper] process_one_work+0x268/0x800 worker_thread+0x30/0x4e0 kthread+0x164/0x190 ret_from_fork+0x14/0x20 The reason for this is that omapdrm calls drm_crtc_vblank_on() while holding event_lock taken with spin_lock_irq(). It is not clear why drm_crtc_vblank_on() and drm_crtc_vblank_get() are called while holding event_lock. I don't see any problem with moving those calls outside the lock, which is what this patch does. Signed-off-by: Tomi Valkeinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Laurent Pinchart <[email protected]>
…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <[email protected]> Reviewed-by: Sumanth Korikkar <[email protected]> Acked-by: Ian Rogers <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
[ Upstream commit 93c660c ] ASAN reports an use-after-free in btf_dump_name_dups: ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928 READ of size 2 at 0xffff927006db thread T0 #0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614) #1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127 #2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143 #3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212 #4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525 #5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552 #6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567 #7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912 #8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798 #9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282 #10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236 #11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 #12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 #13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 #14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 #15 0xaaaab5d65990 (test_progs+0x185990) 0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0) freed by thread T0 here: #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4) #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191 #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163 #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106 #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157 #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519 #6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032 #7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232 #8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 #9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 #10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 #11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 #12 0xaaaab5d65990 (test_progs+0x185990) previously allocated by thread T0 here: #0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4) #1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191 #2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163 #3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106 #4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157 #5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519 #6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070 #7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102 #8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162 #9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875 #10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062 #11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697 #12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308 #13 0xaaaab5d65990 (test_progs+0x185990) The reason is that the key stored in hash table name_map is a string address, and the string memory is allocated by realloc() function, when the memory is resized by realloc() later, the old memory may be freed, so the address stored in name_map references to a freed memory, causing use-after-free. Fix it by storing duplicated string address in name_map. Fixes: 919d2b1 ("libbpf: Allow modification of BTF and add btf__add_str API") Signed-off-by: Xu Kuohai <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
…g the sock [ Upstream commit 3cf7203 ] There is a race condition in vxlan that when deleting a vxlan device during receiving packets, there is a possibility that the sock is released after getting vxlan_sock vs from sk_user_data. Then in later vxlan_ecn_decapsulate(), vxlan_get_sk_family() we will got NULL pointer dereference. e.g. #0 [ffffa25ec6978a38] machine_kexec at ffffffff8c669757 #1 [ffffa25ec6978a90] __crash_kexec at ffffffff8c7c0a4d #2 [ffffa25ec6978b58] crash_kexec at ffffffff8c7c1c48 #3 [ffffa25ec6978b60] oops_end at ffffffff8c627f2b #4 [ffffa25ec6978b80] page_fault_oops at ffffffff8c678fcb #5 [ffffa25ec6978bd8] exc_page_fault at ffffffff8d109542 #6 [ffffa25ec6978c00] asm_exc_page_fault at ffffffff8d200b62 [exception RIP: vxlan_ecn_decapsulate+0x3b] RIP: ffffffffc1014e7b RSP: ffffa25ec6978cb0 RFLAGS: 00010246 RAX: 0000000000000008 RBX: ffff8aa000888000 RCX: 0000000000000000 RDX: 000000000000000e RSI: ffff8a9fc7ab803e RDI: ffff8a9fd1168700 RBP: ffff8a9fc7ab803e R8: 0000000000700000 R9: 00000000000010ae R10: ffff8a9fcb748980 R11: 0000000000000000 R12: ffff8a9fd1168700 R13: ffff8aa000888000 R14: 00000000002a0000 R15: 00000000000010ae ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffa25ec6978ce8] vxlan_rcv at ffffffffc10189cd [vxlan] #8 [ffffa25ec6978d90] udp_queue_rcv_one_skb at ffffffff8cfb6507 #9 [ffffa25ec6978dc0] udp_unicast_rcv_skb at ffffffff8cfb6e45 #10 [ffffa25ec6978dc8] __udp4_lib_rcv at ffffffff8cfb8807 #11 [ffffa25ec6978e20] ip_protocol_deliver_rcu at ffffffff8cf76951 #12 [ffffa25ec6978e48] ip_local_deliver at ffffffff8cf76bde #13 [ffffa25ec6978ea0] __netif_receive_skb_one_core at ffffffff8cecde9b #14 [ffffa25ec6978ec8] process_backlog at ffffffff8cece139 #15 [ffffa25ec6978f00] __napi_poll at ffffffff8ceced1a #16 [ffffa25ec6978f28] net_rx_action at ffffffff8cecf1f3 #17 [ffffa25ec6978fa0] __softirqentry_text_start at ffffffff8d4000ca #18 [ffffa25ec6978ff0] do_softirq at ffffffff8c6fbdc3 Reproducer: https://github.com/Mellanox/ovs-tests/blob/master/test-ovs-vxlan-remove-tunnel-during-traffic.sh Fix this by waiting for all sk_user_data reader to finish before releasing the sock. Reported-by: Jianlin Shi <[email protected]> Suggested-by: Jakub Sitnicki <[email protected]> Fixes: 6a93cc9 ("udp-tunnel: Add a few more UDP tunnel APIs") Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Fix an error detected by memory sanitizer:
```
==4033==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6
#1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2
#2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9
#3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32
#4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6
#5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8
#6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8
#7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9
#8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8
#9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15
#10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9
#11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8
#12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5
#13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9
#14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11
#15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8
#16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2
#17 0x55fb0f941dab in main tools/perf/perf.c:535:3
```
Fixes: 7b723db ("perf pmu: Be lazy about loading event info files from sysfs")
Signed-off-by: Ian Rogers <[email protected]>
Cc: James Clark <[email protected]>
Cc: Kan Liang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Mike Christie <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
[ Upstream commit fe2640b ] In remove_phb_dynamic() we use &phb->io_resource, after we've called device_unregister(&host_bridge->dev). But the unregister may have freed phb, because pcibios_free_controller_deferred() is the release function for the host_bridge. If there are no outstanding references when we call device_unregister() then phb will be freed out from under us. This has gone mainly unnoticed, but with slub_debug and page_poison enabled it can lead to a crash: PID: 7574 TASK: c0000000d492cb80 CPU: 13 COMMAND: "drmgr" #0 [c0000000e4f075a0] crash_kexec at c00000000027d7dc #1 [c0000000e4f075d0] oops_end at c000000000029608 #2 [c0000000e4f07650] __bad_page_fault at c0000000000904b4 #3 [c0000000e4f076c0] do_bad_slb_fault at c00000000009a5a8 #4 [c0000000e4f076f0] data_access_slb_common_virt at c000000000008b30 Data SLB Access [380] exception frame: R0: c000000000167250 R1: c0000000e4f07a00 R2: c000000002a46100 R3: c000000002b39ce8 R4: 00000000000000c0 R5: 00000000000000a9 R6: 3894674d000000c0 R7: 0000000000000000 R8: 00000000000000ff R9: 0000000000000100 R10: 6b6b6b6b6b6b6b6b R11: 0000000000008000 R12: c00000000023da80 R13: c0000009ffd38b00 R14: 0000000000000000 R15: 000000011c87f0f0 R16: 0000000000000006 R17: 0000000000000003 R18: 0000000000000002 R19: 0000000000000004 R20: 0000000000000005 R21: 000000011c87ede8 R22: 000000011c87c5a8 R23: 000000011c87d3a0 R24: 0000000000000000 R25: 0000000000000001 R26: c0000000e4f07cc8 R27: c00000004d1cc400 R28: c0080000031d00e8 R29: c00000004d23d800 R30: c00000004d1d2400 R31: c00000004d1d2540 NIP: c000000000167258 MSR: 8000000000009033 OR3: c000000000e9f474 CTR: 0000000000000000 LR: c000000000167250 XER: 0000000020040003 CCR: 0000000024088420 MQ: 0000000000000000 DAR: 6b6b6b6b6b6b6ba3 DSISR: c0000000e4f07920 Syscall Result: fffffffffffffff2 [NIP : release_resource+56] [LR : release_resource+48] #5 [c0000000e4f07a00] release_resource at c000000000167258 (unreliable) #6 [c0000000e4f07a30] remove_phb_dynamic at c000000000105648 #7 [c0000000e4f07ab0] dlpar_remove_slot at c0080000031a09e8 [rpadlpar_io] #8 [c0000000e4f07b50] remove_slot_store at c0080000031a0b9c [rpadlpar_io] #9 [c0000000e4f07be0] kobj_attr_store at c000000000817d8c #10 [c0000000e4f07c00] sysfs_kf_write at c00000000063e504 #11 [c0000000e4f07c20] kernfs_fop_write_iter at c00000000063d868 #12 [c0000000e4f07c70] new_sync_write at c00000000054339c #13 [c0000000e4f07d10] vfs_write at c000000000546624 #14 [c0000000e4f07d60] ksys_write at c0000000005469f4 #15 [c0000000e4f07db0] system_call_exception at c000000000030840 #16 [c0000000e4f07e10] system_call_vectored_common at c00000000000c168 To avoid it, we can take a reference to the host_bridge->dev until we're done using phb. Then when we drop the reference the phb will be freed. Fixes: 2dd9c11 ("powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)") Reported-by: David Dai <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Tested-by: Sachin Sant <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
commit 66f133c upstream. When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check that even slots in the tsk->kmap_ctrl.pteval are unmapped. The slots are initialized with 0 value, but the check is done with pte_none. 0 pte however does not necessarily mean that pte_none will return true. e.g. on xtensa it returns false, resulting in the following runtime warnings: WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_out+0x51/0x108 __schedule+0x71a/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0 CPU: 0 PID: 101 Comm: touch Tainted: G W 5.17.0-rc7-00010-gd3a1cdde80d2-dirty #13 Call Trace: dump_stack+0xc/0x40 __warn+0x8f/0x174 warn_slowpath_fmt+0x48/0xac __kmap_local_sched_in+0x50/0xe0 finish_task_switch$isra$0+0x1ce/0x2f8 __schedule+0x86e/0x9c4 preempt_schedule_irq+0xa0/0xe0 common_exception_return+0x5c/0x93 do_wp_page+0x30e/0x330 handle_mm_fault+0xa70/0xc3c do_page_fault+0x1d8/0x3c4 common_exception+0x7f/0x7f Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5fbda3e ("sched: highmem: Store local kmaps in task struct") Signed-off-by: Max Filippov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: [email protected] # 6.1+ Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Reviewed-by: Heming Zhao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Running 'stress-ng --binderfs 16 --timeout 300' under KASAN-enabled kernel, I've noticed the following: BUG: KASAN: slab-use-after-free in binderfs_evict_inode+0x1de/0x2d0 Write of size 8 at addr ffff88807379bc08 by task stress-ng-binde/1699 CPU: 0 UID: 0 PID: 1699 Comm: stress-ng-binde Not tainted 6.14.0-rc7-g586de92313fc-dirty #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x1c2/0x2a0 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __virt_addr_valid+0x18c/0x540 ? __virt_addr_valid+0x469/0x540 print_report+0x155/0x840 ? __virt_addr_valid+0x18c/0x540 ? __virt_addr_valid+0x469/0x540 ? __phys_addr+0xba/0x170 ? binderfs_evict_inode+0x1de/0x2d0 kasan_report+0x147/0x180 ? binderfs_evict_inode+0x1de/0x2d0 binderfs_evict_inode+0x1de/0x2d0 ? __pfx_binderfs_evict_inode+0x10/0x10 evict+0x524/0x9f0 ? __pfx_lock_release+0x10/0x10 ? __pfx_evict+0x10/0x10 ? do_raw_spin_unlock+0x4d/0x210 ? _raw_spin_unlock+0x28/0x50 ? iput+0x697/0x9b0 __dentry_kill+0x209/0x660 ? shrink_kill+0x8d/0x2c0 shrink_kill+0xa9/0x2c0 shrink_dentry_list+0x2e0/0x5e0 shrink_dcache_parent+0xa2/0x2c0 ? __pfx_shrink_dcache_parent+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_do_raw_spin_lock+0x10/0x10 do_one_tree+0x23/0xe0 shrink_dcache_for_umount+0xa0/0x170 generic_shutdown_super+0x67/0x390 kill_litter_super+0x76/0xb0 binderfs_kill_super+0x44/0x90 deactivate_locked_super+0xb9/0x130 cleanup_mnt+0x422/0x4c0 ? lockdep_hardirqs_on+0x9d/0x150 task_work_run+0x1d2/0x260 ? __pfx_task_work_run+0x10/0x10 resume_user_mode_work+0x52/0x60 syscall_exit_to_user_mode+0x9a/0x120 do_syscall_64+0x103/0x210 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0xcac57b Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 RSP: 002b:00007ffecf4226a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 00007ffecf422720 RCX: 0000000000cac57b RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffecf422850 RBP: 00007ffecf422850 R08: 0000000028d06ab1 R09: 7fffffffffffffff R10: 3fffffffffffffff R11: 0000000000000246 R12: 00007ffecf422718 R13: 00007ffecf422710 R14: 00007f478f87b658 R15: 00007ffecf422830 </TASK> Allocated by task 1705: kasan_save_track+0x3e/0x80 __kasan_kmalloc+0x8f/0xa0 __kmalloc_cache_noprof+0x213/0x3e0 binderfs_binder_device_create+0x183/0xa80 binder_ctl_ioctl+0x138/0x190 __x64_sys_ioctl+0x120/0x1b0 do_syscall_64+0xf6/0x210 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 1705: kasan_save_track+0x3e/0x80 kasan_save_free_info+0x46/0x50 __kasan_slab_free+0x62/0x70 kfree+0x194/0x440 evict+0x524/0x9f0 do_unlinkat+0x390/0x5b0 __x64_sys_unlink+0x47/0x50 do_syscall_64+0xf6/0x210 entry_SYSCALL_64_after_hwframe+0x77/0x7f This 'stress-ng' workload causes the concurrent deletions from 'binder_devices' and so requires full-featured synchronization to prevent list corruption. I've found this issue independently but pretty sure that syzbot did the same, so Reported-by: and Closes: should be applicable here as well. Cc: [email protected] Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=353d7b75658a95aa955a Fixes: e77aff5 ("binderfs: fix use-after-free in binder_devices") Signed-off-by: Dmitry Antipov <[email protected]> Acked-by: Carlos Llamas <[email protected]> Signed-off-by: Carlos Llamas <[email protected]> Reviewed-by: Alice Ryhl <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
The OP-TEE driver registers the function notif_callback() for FF-A notifications. However, this function is called in an atomic context leading to errors like this when processing asynchronous notifications: | BUG: sleeping function called from invalid context at kernel/locking/mutex.c:258 | in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 9, name: kworker/0:0 | preempt_count: 1, expected: 0 | RCU nest depth: 0, expected: 0 | CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.14.0-00019-g657536ebe0aa #13 | Hardware name: linux,dummy-virt (DT) | Workqueue: ffa_pcpu_irq_notification notif_pcpu_irq_work_fn | Call trace: | show_stack+0x18/0x24 (C) | dump_stack_lvl+0x78/0x90 | dump_stack+0x18/0x24 | __might_resched+0x114/0x170 | __might_sleep+0x48/0x98 | mutex_lock+0x24/0x80 | optee_get_msg_arg+0x7c/0x21c | simple_call_with_arg+0x50/0xc0 | optee_do_bottom_half+0x14/0x20 | notif_callback+0x3c/0x48 | handle_notif_callbacks+0x9c/0xe0 | notif_get_and_handle+0x40/0x88 | generic_exec_single+0x80/0xc0 | smp_call_function_single+0xfc/0x1a0 | notif_pcpu_irq_work_fn+0x2c/0x38 | process_one_work+0x14c/0x2b4 | worker_thread+0x2e4/0x3e0 | kthread+0x13c/0x210 | ret_from_fork+0x10/0x20 Fix this by adding work queue to process the notification in a non-atomic context. Fixes: d0476a5 ("optee: ffa_abi: add asynchronous notifications") Cc: [email protected] Reviewed-by: Sumit Garg <[email protected]> Tested-by: Sudeep Holla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Wiklander <[email protected]>
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:
$ perf record -a -g --call-graph=dwarf
$ perf report
# hung
`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`
$ strace -y -f -p 2780484
strace: Process 2780484 attached
pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached
It's call trace descends into `elfutils`:
$ gdb -p 2780484
(gdb) bt
#0 0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
at ../sysdeps/unix/sysv/linux/pread64.c:25
#1 0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
#2 0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#3 0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#4 0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#5 0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
at util/dso.h:537
#6 0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
#7 frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
#8 0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#9 0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
#12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
best_effort=best_effort@entry=false) at util/thread.h:152
#13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
#14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
max_stack=127, symbols=true) at util/machine.c:2920
#15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
at util/machine.c:2970
#16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
#17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
#18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
#19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
#20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
#21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
#22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
#23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
file_path=0x105ffbf0 "perf.data") at util/session.c:1419
#24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
--Type <RET> for more, q to quit, c to continue without paging--
quit
prog=prog@entry=0x7fff9df81220) at util/session.c:2132
#25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
at util/session.c:2181
#26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
#27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
#28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
#29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
#30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
at perf.c:351
#31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
#32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
#33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556
The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.
The change conservatively skips all non-regular files.
Signed-off-by: Sergei Trofimovich <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
While configuring a vxlan tunnel in a system with a i40e NIC driver, I observe the following deadlock: WARNING: possible recursive locking detected 6.16.0-rc2.net-next-6.16_92d87230d899+ #13 Tainted: G E -------------------------------------------- kworker/u256:4/1125 is trying to acquire lock: ffff88921ab9c8c8 (&utn->lock){+.+.}-{4:4}, at: i40e_udp_tunnel_set_port (/home/pabeni/net-next/include/net/udp_tunnel.h:343 /home/pabeni/net-next/drivers/net/ethernet/intel/i40e/i40e_main.c:13013) i40e but task is already holding lock: ffff88921ab9c8c8 (&utn->lock){+.+.}-{4:4}, at: udp_tunnel_nic_device_sync_work (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:739) udp_tunnel other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&utn->lock); lock(&utn->lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u256:4/1125: #0: ffff8892910ca158 ((wq_completion)udp_tunnel_nic){+.+.}-{0:0}, at: process_one_work (/home/pabeni/net-next/kernel/workqueue.c:3213) #1: ffffc900244efd30 ((work_completion)(&utn->work)){+.+.}-{0:0}, at: process_one_work (/home/pabeni/net-next/kernel/workqueue.c:3214) #2: ffffffff9a14e290 (rtnl_mutex){+.+.}-{4:4}, at: udp_tunnel_nic_device_sync_work (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:737) udp_tunnel #3: ffff88921ab9c8c8 (&utn->lock){+.+.}-{4:4}, at: udp_tunnel_nic_device_sync_work (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:739) udp_tunnel stack backtrace: Hardware name: Dell Inc. PowerEdge R7525/0YHMCJ, BIOS 2.2.5 04/08/2021 i Call Trace: <TASK> dump_stack_lvl (/home/pabeni/net-next/lib/dump_stack.c:123) print_deadlock_bug (/home/pabeni/net-next/kernel/locking/lockdep.c:3047) validate_chain (/home/pabeni/net-next/kernel/locking/lockdep.c:3901) __lock_acquire (/home/pabeni/net-next/kernel/locking/lockdep.c:5240) lock_acquire.part.0 (/home/pabeni/net-next/kernel/locking/lockdep.c:473 /home/pabeni/net-next/kernel/locking/lockdep.c:5873) __mutex_lock (/home/pabeni/net-next/kernel/locking/mutex.c:604 /home/pabeni/net-next/kernel/locking/mutex.c:747) i40e_udp_tunnel_set_port (/home/pabeni/net-next/include/net/udp_tunnel.h:343 /home/pabeni/net-next/drivers/net/ethernet/intel/i40e/i40e_main.c:13013) i40e udp_tunnel_nic_device_sync_by_port (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:230 /home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:249) udp_tunnel __udp_tunnel_nic_device_sync.part.0 (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:292) udp_tunnel udp_tunnel_nic_device_sync_work (/home/pabeni/net-next/net/ipv4/udp_tunnel_nic.c:742) udp_tunnel process_one_work (/home/pabeni/net-next/kernel/workqueue.c:3243) worker_thread (/home/pabeni/net-next/kernel/workqueue.c:3315 /home/pabeni/net-next/kernel/workqueue.c:3402) kthread (/home/pabeni/net-next/kernel/kthread.c:464) AFAICS all the existing callsites of udp_tunnel_nic_set_port_priv() are already under the utn lock scope, avoid (re-)acquiring it in such a function. Fixes: 1ead750 ("udp_tunnel: remove rtnl_lock dependency") Signed-off-by: Paolo Abeni <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://patch.msgid.link/95a827621ec78c12d1564ec3209e549774f9657d.1750675978.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <[email protected]>
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.
Example output:
```
$ perf test -vv PERF_RECORD_
...
7: PERF_RECORD_* events & perf_sample fields:
7: PERF_RECORD_* events & perf_sample fields : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
: 7: PERF_RECORD_* events & perf_sample fields:
---- unexpected signal (2) ----
#0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
#1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
#3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
#4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
#5 0x7fc12ff02d68 in __sleep sleep.c:55
#6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
#7 0x55788c620fb0 in run_test_child builtin-test.c:0
#8 0x55788c5bd18d in start_command run-command.c:127
#9 0x55788c621ef3 in __cmd_test builtin-test.c:0
#10 0x55788c6225bf in cmd_test ??:0
#11 0x55788c5afbd0 in run_builtin perf.c:0
#12 0x55788c5afeeb in handle_internal_command perf.c:0
#13 0x55788c52b383 in main ??:0
#14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
#15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#16 0x55788c52b9d1 in _start ??:0
---- unexpected signal (2) ----
#0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
#1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
#3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
#4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
#5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
#6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
#7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
#8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
#9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
#10 0x7fc12ff02d68 in __sleep sleep.c:55
#11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
#12 0x55788c620fb0 in run_test_child builtin-test.c:0
#13 0x55788c5bd18d in start_command run-command.c:127
#14 0x55788c621ef3 in __cmd_test builtin-test.c:0
#15 0x55788c6225bf in cmd_test ??:0
#16 0x55788c5afbd0 in run_builtin perf.c:0
#17 0x55788c5afeeb in handle_internal_command perf.c:0
#18 0x55788c52b383 in main ??:0
#19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
#20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#21 0x55788c52b9d1 in _start ??:0
7: PERF_RECORD_* events & perf_sample fields : Skip (permissions)
```
Signed-off-by: Ian Rogers <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Calling perf top with branch filters enabled on Intel CPU's
with branch counters logging (A.K.A LBR event logging [1]) support
results in a segfault.
$ perf top -e '{cpu_core/cpu-cycles/,cpu_core/event=0xc6,umask=0x3,frontend=0x11,name=frontend_retired_dsb_miss/}' -j any,counter
...
Thread 27 "perf" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffafff76c0 (LWP 949003)]
perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
653 *width = env->cpu_pmu_caps ? env->br_cntr_width :
(gdb) bt
#0 perf_env__find_br_cntr_info (env=0xf66dc0 <perf_env>, nr=0x0, width=0x7fffafff62c0) at util/env.c:653
#1 0x00000000005b1599 in symbol__account_br_cntr (branch=0x7fffcc3db580, evsel=0xfea2d0, offset=12, br_cntr=8) at util/annotate.c:345
#2 0x00000000005b17fb in symbol__account_cycles (addr=5658172, start=5658160, sym=0x7fffcc0ee420, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:389
#3 0x00000000005b1976 in addr_map_symbol__account_cycles (ams=0x7fffcd7b01d0, start=0x7fffcd7b02b0, cycles=539, evsel=0xfea2d0, br_cntr=8) at util/annotate.c:422
#4 0x000000000068d57f in hist__account_cycles (bs=0x110d288, al=0x7fffafff6540, sample=0x7fffafff6760, nonany_branch_mode=false, total_cycles=0x0, evsel=0xfea2d0) at util/hist.c:2850
#5 0x0000000000446216 in hist_iter__top_callback (iter=0x7fffafff6590, al=0x7fffafff6540, single=true, arg=0x7fffffff9e00) at builtin-top.c:737
#6 0x0000000000689787 in hist_entry_iter__add (iter=0x7fffafff6590, al=0x7fffafff6540, max_stack_depth=127, arg=0x7fffffff9e00) at util/hist.c:1359
#7 0x0000000000446710 in perf_event__process_sample (tool=0x7fffffff9e00, event=0x110d250, evsel=0xfea2d0, sample=0x7fffafff6760, machine=0x108c968) at builtin-top.c:845
#8 0x0000000000447735 in deliver_event (qe=0x7fffffffa120, qevent=0x10fc200) at builtin-top.c:1211
#9 0x000000000064ccae in do_flush (oe=0x7fffffffa120, show_progress=false) at util/ordered-events.c:245
#10 0x000000000064d005 in __ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP, timestamp=0) at util/ordered-events.c:324
#11 0x000000000064d0ef in ordered_events__flush (oe=0x7fffffffa120, how=OE_FLUSH__TOP) at util/ordered-events.c:342
#12 0x00000000004472a9 in process_thread (arg=0x7fffffff9e00) at builtin-top.c:1120
#13 0x00007ffff6e7dba8 in start_thread (arg=<optimized out>) at pthread_create.c:448
#14 0x00007ffff6f01b8c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
The cause is that perf_env__find_br_cntr_info tries to access a
null pointer pmu_caps in the perf_env struct. A similar issue exists
for homogeneous core systems which use the cpu_pmu_caps structure.
Fix this by populating cpu_pmu_caps and pmu_caps structures with
values from sysfs when calling perf top with branch stack sampling
enabled.
[1], LBR event logging introduced here:
https://lore.kernel.org/all/[email protected]/
Reviewed-by: Ian Rogers <[email protected]>
Signed-off-by: Thomas Falcon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
The commit 86bc882 ("staging: vchiq_arm: Create keep-alive thread during probe") introduced a regression for certain configurations, which doesn't have a VCHIQ user. This results in a unused and hanging keep-alive thread: INFO: task vchiq-keep/0:85 blocked for more than 120 seconds. Not tainted 6.12.34-v8-+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:vchiq-keep/0 state:D stack:0 pid:85 tgid:85 ppid:2 Call trace: __switch_to+0x188/0x230 __schedule+0xa54/0xb28 schedule+0x80/0x120 schedule_preempt_disabled+0x30/0x50 kthread+0xd4/0x1a0 ret_from_fork+0x10/0x20 Fixes: 86bc882 ("staging: vchiq_arm: Create keep-alive thread during probe") Reported-by: Maíra Canal <[email protected]> Closes: https://lore.kernel.org/linux-staging/[email protected]/ Cc: [email protected] Signed-off-by: Stefan Wahren <[email protected]> Reviewed-by: Maíra Canal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit eedf3e3 ] ACPICA commit 1c28da2242783579d59767617121035dafba18c3 This was originally done in NetBSD: NetBSD/src@b69d1ac and is the correct alternative to the smattering of `memcpy`s I previously contributed to this repository. This also sidesteps the newly strict checks added in UBSAN: llvm/llvm-project@7926744 Before this change we see the following UBSAN stack trace in Fuchsia: #0 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #1.2 0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c #1.1 0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c #1 0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c #2 0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f #3 0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723 #4 0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e #5 0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089 #6 0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169 #7 0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a #8 0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7 #9 0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979 #10 0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f #11 0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf #12 0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278 #13 0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87 #14 0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d #15 0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e #16 0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad #17 0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e #18 0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7 #19 0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342 #20 0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3 #21 0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616 #22 0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323 #23 0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76 #24 0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831 #25 0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc #26 0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58 #27 0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159 #28 0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414 #29 0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d #30 0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7 #31 0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66 #32 0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9 #33 0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d #34 0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983 #35 0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e #36 0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509 #37 0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958 #38 0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247 #39 0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962 #40 0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30 #41 0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d Link: acpica/acpica@1c28da22 Signed-off-by: Rafael J. Wysocki <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Tamir Duberstein <[email protected]> [ rjw: Pick up the tag from Tamir ] Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
commit 228af5a upstream. The commit 86bc882 ("staging: vchiq_arm: Create keep-alive thread during probe") introduced a regression for certain configurations, which doesn't have a VCHIQ user. This results in a unused and hanging keep-alive thread: INFO: task vchiq-keep/0:85 blocked for more than 120 seconds. Not tainted 6.12.34-v8-+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:vchiq-keep/0 state:D stack:0 pid:85 tgid:85 ppid:2 Call trace: __switch_to+0x188/0x230 __schedule+0xa54/0xb28 schedule+0x80/0x120 schedule_preempt_disabled+0x30/0x50 kthread+0xd4/0x1a0 ret_from_fork+0x10/0x20 Fixes: 86bc882 ("staging: vchiq_arm: Create keep-alive thread during probe") Reported-by: Maíra Canal <[email protected]> Closes: https://lore.kernel.org/linux-staging/[email protected]/ Cc: [email protected] Signed-off-by: Stefan Wahren <[email protected]> Reviewed-by: Maíra Canal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Patch series "mm: remove nth_page()", v2. As discussed recently with Linus, nth_page() is just nasty and we would like to remove it. To recap, the reason we currently need nth_page() within a folio is because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the memmap is allocated per memory section. While buddy allocations cannot cross memory section boundaries, hugetlb and dax folios can. So crossing a memory section means that "page++" could do the wrong thing. Instead, nth_page() on these problematic configs always goes from page->pfn, to the go from (++pfn)->page, which is rather nasty. Likely, many people have no idea when nth_page() is required and when it might be dropped. We refer to such problematic PFN ranges and "non-contiguous pages". If we only deal with "contiguous pages", there is not need for nth_page(). Besides that "obvious" folio case, we might end up using nth_page() within CMA allocations (again, could span memory sections), and in one corner case (kfence) when processing memblock allocations (again, could span memory sections). So let's handle all that, add sanity checks, and remove nth_page(). Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups Patch #6 -> #13 : disallow folios to have non-contiguous pages Patch #14 -> #20 : remove nth_page() usage within folios Patch #22 : disallow CMA allocations of non-contiguous pages Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry Patch #34 : sanity-check + remove nth_page() usage in unpin_user_page_range_dirty_lock() Patch #35 : remove nth_page() in kfence Patch #36 : adjust stale comment regarding nth_page Patch #37 : mm: remove nth_page() A lot of this is inspired from the discussion at [1] between Linus, Jason and me, so cudos to them. This patch (of 37): In an ideal world, we wouldn't have to deal with SPARSEMEM without SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is considered too costly and consequently not supported. However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP, let's forbid the user to disable VMEMMAP: just like we already do for arm64, s390 and x86. So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without SPARSEMEM_VMEMMAP. This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone for loongarch, powerpc, riscv and sparc. All architectures only enable SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big downside to using the VMEMMAP (quite the contrary). This is a preparation for not supporting (1) folio sizes that exceed a single memory section (2) CMA allocations of non-contiguous page ranges in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit possible impact as much as possible (e.g., gigantic hugetlb page allocations suddenly fails). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Zi Yan <[email protected]> Acked-by: Mike Rapoport (Microsoft) <[email protected]> Acked-by: SeongJae Park <[email protected]> Reviewed-by: Wei Yang <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Huacai Chen <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexandre Ghiti <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Alexandru Elisei <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Alex Willamson <[email protected]> Cc: Bart van Assche <[email protected]> Cc: Borislav Betkov <[email protected]> Cc: Brendan Jackman <[email protected]> Cc: Brett Creeley <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter (Ampere) <[email protected]> Cc: Damien Le Maol <[email protected]> Cc: Dave Airlie <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Dmitriy Vyukov <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jonas Lahtinen <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Lars Persson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Marco Elver <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Maxim Levitky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Niklas Cassel <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Pavel Begunkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Robin Murohy <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Shameerali Kolothum Thodi <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleinxer <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Ulf Hansson <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yishai Hadas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
…CAN XL step 3/3" Vincent Mailhol <[email protected]> says: In November last year, I sent an RFC to introduce CAN XL [1]. That RFC, despite positive feedback, was put on hold due to some unanswered question concerning the PWM encoding [2]. While stuck, some small preparation work was done in parallel in [3] by refactoring the struct can_priv and doing some trivial clean-up and renaming. Initially, [3] received zero feedback but was eventually merged after splitting it in smaller parts and resending it. Finally, in July this year, we clarified the remaining mysteries about PWM calculation, thus unlocking the series. Summer being a bit busy because of some personal matters brings us to now. After doing all the refactoring and adding all the CAN XL features, the final result is more than 30 patches, definitively too much for a single series. So I am splitting the remaining changes three: - can: rework the CAN MTU logic [4] - can: netlink: preparation before introduction of CAN XL (this series) - CAN XL (will come right after the two preparation series get merged) And thus, this series continues and finishes the preparation work done in [3] and [4]. It contains all the refactoring needed to smoothly introduce CAN XL. The goal is to: - split the functions in smaller pieces: CAN XL will introduce a fair amount of code. And some functions which are already fairly long (86 lines for can_validate(), 215 lines for can_changelink()) would grow to disproportionate sizes if the CAN XL logic were to be inlined in those functions. - repurpose the existing code to handle both CAN FD and CAN XL: a huge part of CAN XL simply reuses the CAN FD logic. All the existing CAN FD logic is made more generic to handle both CAN FD and XL. In more details: - Patch #1 moves struct data_bittiming_params from dev.h to bittiming.h and patch #2 makes can_get_relative_tdco() FD agnostic before also moving it to bittiming.h. - Patch #3 adds some comments to netlink.h tagging which IFLA symbols are FD specific. - Patches #4 to #6 are refactoring can_validate() and can_validate_bittiming(). - Patches #7 to #11 are refactoring can_changelink() and can_tdc_changelink(). - Patches #12 and #13 are refactoring can_get_size() and can_tdc_get_size(). - Patches #14 to #17 are refactoring can_fill_info() and can_tdc_fill_info(). - Patch #18 makes can_calc_tdco() FD agnostic. - Patch #19 adds can_get_ctrlmode_str() which converts control mode flags into strings. This is done in preparation of patch #20. - Patch #20 is the final patch and improves the user experience by providing detailed error messages whenever invalid parameters are provided. All those error messages came into handy when debugging the upcoming CAN XL patches. Aside from the last patch, the other changes do not impact any of the existing functionalities. The follow up series which introduces CAN XL is nearly completed but will be sent only once this one is approved: one thing at a time, I do not want to overwhelm people (including myself). [1] https://lore.kernel.org/linux-can/[email protected]/ [2] https://lore.kernel.org/linux-can/[email protected]/ [3] https://lore.kernel.org/linux-can/[email protected]/ [4] https://lore.kernel.org/linux-can/[email protected]/ Link: https://patch.msgid.link/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
Petr Machata says: ==================== selftests: Mark auto-deferring functions clearly selftests/net/lib.sh contains a suite of iproute2 wrappers that automatically schedule the corresponding cleanup through defer. The fact they do so is however not immediately obvious, one needs to know which functions are handling the deferral behind the scenes, and which expect the caller to handle cleanups themselves. A convention for these auto-deferring functions would help both writing and patch review. This patchset does so by marking these functions with an adf_ prefix. We already have a few such functions: forwarding/lib.sh has adf_mcd_start() and a few selftests add private helpers that conform to this convention. Patches #1 to #8 gradually convert individual functions, one per patch. Patch #9 renames an auto-deferring private helpers named dfr_* to adf_*. The plan is not to retro-rename all private helpers, but I happened to know about this one. Patches #10 to #12 introduce several autodefer helpers for commonly used forwarding/lib.sh functions, and opportunistically convert straightforward instances of 'action; defer counteraction' to the new helpers. Patch #13 adds some README verbiage to pitch defer and the adf_* convention. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
The test starts a workload and then opens events. If the events fail
to open, for example because of perf_event_paranoid, the gopipe of the
workload is leaked and the file descriptor leak check fails when the
test exits. To avoid this cancel the workload when opening the events
fails.
Before:
```
$ perf test -vv 7
7: PERF_RECORD_* events & perf_sample fields:
--- start ---
test child forked, pid 1189568
Using CPUID GenuineIntel-6-B7-1
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000 (cpu_atom/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0x400000000 (cpu_core/PERF_COUNT_HW_CPU_CYCLES/)
disabled 1
exclude_kernel 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8 = 3
Attempt to add: software/cpu-clock/
..after resolving event: software/config=0/
cpu-clock -> software/cpu-clock/
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0x9 (PERF_COUNT_SW_DUMMY)
sample_type IP|TID|TIME|CPU
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
{ wakeup_events, wakeup_watermark } 1
------------------------------------------------------------
sys_perf_event_open: pid 1189569 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -13
perf_evlist__open: Permission denied
---- end(-2) ----
Leak of file descriptor 6 that opened: 'pipe:[14200347]'
---- unexpected signal (6) ----
iFailed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
Failed to read build ID for //anon
#0 0x565358f6666e in child_test_sig_handler builtin-test.c:311
#1 0x7f29ce849df0 in __restore_rt libc_sigaction.c:0
#2 0x7f29ce89e95c in __pthread_kill_implementation pthread_kill.c:44
#3 0x7f29ce849cc2 in raise raise.c:27
#4 0x7f29ce8324ac in abort abort.c:81
#5 0x565358f662d4 in check_leaks builtin-test.c:226
#6 0x565358f6682e in run_test_child builtin-test.c:344
#7 0x565358ef7121 in start_command run-command.c:128
#8 0x565358f67273 in start_test builtin-test.c:545
#9 0x565358f6771d in __cmd_test builtin-test.c:647
#10 0x565358f682bd in cmd_test builtin-test.c:849
#11 0x565358ee5ded in run_builtin perf.c:349
#12 0x565358ee6085 in handle_internal_command perf.c:401
#13 0x565358ee61de in run_argv perf.c:448
#14 0x565358ee6527 in main perf.c:555
#15 0x7f29ce833ca8 in __libc_start_call_main libc_start_call_main.h:74
#16 0x7f29ce833d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
#17 0x565358e391c1 in _start perf[851c1]
7: PERF_RECORD_* events & perf_sample fields : FAILED!
```
After:
```
$ perf test 7
7: PERF_RECORD_* events & perf_sample fields : Skip (permissions)
```
Fixes: 16d00fe ("perf tests: Move test__PERF_RECORD into separate object")
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Athira Rajeev <[email protected]>
Cc: Chun-Tse Shao <[email protected]>
Cc: Howard Chu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
… 'T'
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
#2 0x570f08 in arch__is perf[570f08]
#3 0x562186 in annotate_get_insn_location perf[562186]
#4 0x562626 in __hist_entry__get_data_type annotate.c:0
#5 0x56476d in annotation_line__write perf[56476d]
#6 0x54e2db in annotate_browser__write annotate.c:0
#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
#8 0x54dc9e in annotate_browser__refresh annotate.c:0
#9 0x54c03d in __ui_browser__refresh browser.c:0
#10 0x54ccf8 in ui_browser__run perf[54ccf8]
#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
#12 0x552293 in do_annotate hists.c:0
#13 0x55941c in evsel__hists_browse hists.c:0
#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
#15 0x42ff02 in cmd_report perf[42ff02]
#16 0x494008 in run_builtin perf.c:0
#17 0x494305 in handle_internal_command perf.c:0
#18 0x410547 in main perf[410547]
#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374a ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <[email protected]>
Tested-by: Namhyung Kim <[email protected]>
Signed-off-by: Tianyou Li <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
When using perf record with the `--overwrite` option, a segmentation fault
occurs if an event fails to open. For example:
perf record -e cycles-ct -F 1000 -a --overwrite
Error:
cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
perf: Segmentation fault
#0 0x6466b6 in dump_stack debug.c:366
#1 0x646729 in sighandler_dump_stack debug.c:378
#2 0x453fd1 in sigsegv_handler builtin-record.c:722
#3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090]
#4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862
#5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943
#6 0x458090 in record__synthesize builtin-record.c:2075
#7 0x45a85a in __cmd_record builtin-record.c:2888
#8 0x45deb6 in cmd_record builtin-record.c:4374
#9 0x4e5e33 in run_builtin perf.c:349
#10 0x4e60bf in handle_internal_command perf.c:401
#11 0x4e6215 in run_argv perf.c:448
#12 0x4e653a in main perf.c:555
#13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72]
#14 0x43a3ee in _start ??:0
The --overwrite option implies --tail-synthesize, which collects non-sample
events reflecting the system status when recording finishes. However, when
evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist
is not initialized and remains NULL. The code unconditionally calls
record__synthesize() in the error path, which iterates through the NULL
evlist pointer and causes a segfault.
To fix it, move the record__synthesize() call inside the error check block, so
it's only called when there was no error during recording, ensuring that evlist
is properly initialized.
Fixes: 4ea648a ("perf record: Add --tail-synthesize option")
Signed-off-by: Shuai Xue <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
I have removed a custom API for accessing EDID data, using instead a DRM-layer internal structure. It seemed pointless to me (otherwise please RFC). Tegra driver seems to do this way.. I added also an extra check to avoid calling a potential null helper function.
With this changes, the adi_axi_hdmi driver works also with the mainline version of the ADV7511(provided that the DT is changed accordingly), and it shouldn't break anything with the ADV7511 in the analogdevicesinc tree.
I don't know if there is any plan to merge the ADV7511 driver in the analogdeviceinc tree with the mainline version, but IMHO my changes should make sense either way.
BTW at this point the adi_axi_hdmi should work with the mainline tree without touching anything else (except for my one-line-patch fixing VDMA not setting the INTERLEAVE bit in cap_mask, that wouldn't even necessary if we would drop support for old VDMA driver). So, I'm wondering whether at this point it would be possible pushing adi_axi_hdmi driver for mainlining :)