-
Notifications
You must be signed in to change notification settings - Fork 59.6k
Merge rc3 #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Merge rc3 #272
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The NVIDIA Tegra XUSB pad controller provides a set of pads, each with a set of lanes that are used for PCIe, SATA and USB. Signed-off-by: Thierry Reding <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Stephen Warren <[email protected]>
This is an old version of the binding that isn't flexible enough to describe all aspects of the XUSB pad controller. Specifically with the addition of XUSB support (for SuperSpeed USB) the existing binding is no longer suitable. Signed-off-by: Thierry Reding <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Stephen Warren <[email protected]>
Extend the binding to cover the set of feature found in Tegra210. Signed-off-by: Thierry Reding <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Stephen Warren <[email protected]>
Add a new driver for the XUSB pad controller found on NVIDIA Tegra SoCs. This hardware block used to be exposed as a pin controller, but it turns out that this isn't a good fit. The new driver and DT binding much more accurately describe the hardware and are more flexible in supporting new SoC generations. Signed-off-by: Thierry Reding <[email protected]>
Add support for the XUSB pad controller found on Tegra210 SoCs. The hardware is roughly the same, but some of the registers have been moved around and the number and type of supported pads has changed. Signed-off-by: Thierry Reding <[email protected]>
Add device-tree binding documentation for the XUSB controller present on Tegra124 and later SoCs. This controller supports USB 3.0 via an xHCI compliant interface. Based on work by Andrew Bresticker <[email protected]>. Cc: Rob Herring <[email protected]> Cc: Pawel Moll <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Ian Campbell <[email protected]> Cc: Kumar Gala <[email protected]> Cc: Mathias Nyman <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Thierry Reding <[email protected]> Acked-by: Stephen Warren <[email protected]>
Extend the Tegra XUSB controller device tree binding with Tegra210 support. Signed-off-by: Thierry Reding <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Stephen Warren <[email protected]>
Add support for the on-chip XUSB controller present on Tegra SoCs. This controller, when loaded with external firmware, exposes an interface compliant with xHCI. This driver loads the firmware, starts the controller, and is able to service host-specific messages sent by the controller's firmware. The controller also supports USB device mode as well as powergating of the SuperSpeed and host-controller logic when not in use, but support for these is not yet implemented. Based on work by: Ajay Gupta <[email protected]> Bharath Yadav <[email protected]> Andrew Bresticker <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Mathias Nyman <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
Parameterize more parts of the driver and add support for Tegra210. Cc: Greg Kroah-Hartman <[email protected]> Cc: Mathias Nyman <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
On Tegra210, hardware control of the SATA and XUSB pad PLLs must be done during the UPHY enable sequence rather than the PLLE enable sequence. Export functions to do this so that hardware control can be enabled from the XUSB padctl driver. Signed-off-by: Andrew Bresticker <[email protected]> Signed-off-by: Rhyland Klein <[email protected]>
Add new bindings used for USB support by the Tegra XUSB pad controller. This includes additional PHY types, USB-specific pinconfig properties, etc. Signed-off-by: Andrew Bresticker <[email protected]> Acked-by: Linus Walleij <[email protected]> Reviewed-by: Stephen Warren <[email protected]>
MAXIM Semiconductor's PMIC, MAX77620/MAX20024 has 8 GPIO pins. It also supports interrupts from these pins. Add GPIO driver for these pins to control via GPIO APIs. Signed-off-by: Laxman Dewangan <[email protected]> Reviewed-by: Linus Walleij <[email protected]>
Maxim Semiconductor's PMIC MAX77620/MAX20024 has 8 GPIO pins which act as GPIO as well as special function mode. Add DT binding document to support these pins in GPIO mode via GPIO framework. Signed-off-by: Laxman Dewangan <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Linus Walleij <[email protected]>
MAXIM Semiconductor's PMIC, MAX77620/MAX20024 has 8 GPIO pins which also act as the special function in alternate mode. Also there is configuration like push-pull, open drain, FPS timing etc for these pins. Add pin control driver to configure these parameters through pin control APIs. Signed-off-by: Laxman Dewangan <[email protected]> Reviewed-by: Linus Walleij <[email protected]>
Maxim Semiconductor's PMIC MAX77620/MAX20024 has 8 GPIO pins which act as GPIO as well as special function mode. Add DT binding document to configure pins in function mode as well as pin configuration parameters. Signed-off-by: Laxman Dewangan <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Linus Walleij <[email protected]>
MAX77620/MAX20024 are Power Management IC from the MAXIM. It supports RTC, multiple GPIOs, multiple DCDC and LDOs, watchdog, clock etc. Add MFD drier to provides common support for accessing the device; additional drivers is developed on respected subsystem in order to use the functionality of the device. Signed-off-by: Laxman Dewangan <[email protected]> Signed-off-by: Mallikarjun Kasoju <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]>
The MAXIM PMIC MAX77620 and MAX20024 are power management IC which supports RTC, GPIO, DCDC/LDO regulators, interrupt, watchdog etc. Add DT binding document for the different functionality of this device. Signed-off-by: Laxman Dewangan <[email protected]> Acked-by: Rob Herring <[email protected]>
Enable multi-master mode in I2C_CNFG reg based on hw features.
Using single/multi-master mode bit introduced for Tegra210,
whereas multi-master mode is enabled by default in HW for T124 and
earlier Tegra SOC. Enabling this bit doesn't explicitly start
treating the bus has having multiple masters, but will start
checking for arbitration lost and reporting when it occurs.
The Tegra210 I2C controller supports single/multi master mode.
Add chipdata for Tegra210 and its compatibility string so that
Tegra210 will select data that enables multi master mode correctly.
Do below prerequisites for multi-master bus if "multi-master"
dt property entry is added.
1. Enable 1st level clock always set.
2. Disable 2nd level clock gating (slcg which
is supported from T124 SOC and later chips)
Signed-off-by: Shardar Shariff Md <[email protected]>
Add support for the Tegra210 Audio DMA controller that is used for transferring data between system memory and the Audio sub-system. The driver only supports cyclic transfers because this is being solely used for audio. This driver is based upon the work by Dara Ramesh <[email protected]>. Signed-off-by: Jon Hunter <[email protected]>
hauke
pushed a commit
to hauke/linux
that referenced
this pull request
Jun 12, 2016
This file isn't getting removed while we unbind a device from thermal zone. And this causes following messages when the device is registered again: WARNING: CPU: 0 PID: 2228 at /home/viresh/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x70() sysfs: cannot create duplicate filename '/devices/virtual/thermal/thermal_zone0/cdev0_weight' Modules linked in: cpufreq_dt(+) [last unloaded: cpufreq_dt] CPU: 0 PID: 2228 Comm: insmod Not tainted 4.2.0-rc3-00059-g44fffd9473eb torvalds#272 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c00153e8>] (unwind_backtrace) from [<c0012368>] (show_stack+0x10/0x14) [<c0012368>] (show_stack) from [<c053a684>] (dump_stack+0x84/0xc4) [<c053a684>] (dump_stack) from [<c002284c>] (warn_slowpath_common+0x80/0xb0) [<c002284c>] (warn_slowpath_common) from [<c00228ac>] (warn_slowpath_fmt+0x30/0x40) [<c00228ac>] (warn_slowpath_fmt) from [<c012d524>] (sysfs_warn_dup+0x60/0x70) [<c012d524>] (sysfs_warn_dup) from [<c012d244>] (sysfs_add_file_mode_ns+0x13c/0x190) [<c012d244>] (sysfs_add_file_mode_ns) from [<c012d2d4>] (sysfs_create_file_ns+0x3c/0x48) [<c012d2d4>] (sysfs_create_file_ns) from [<c03c04a8>] (thermal_zone_bind_cooling_device+0x260/0x358) [<c03c04a8>] (thermal_zone_bind_cooling_device) from [<c03c2e70>] (of_thermal_bind+0x88/0xb4) [<c03c2e70>] (of_thermal_bind) from [<c03c10d0>] (__thermal_cooling_device_register+0x17c/0x2e0) [<c03c10d0>] (__thermal_cooling_device_register) from [<c03c3f50>] (__cpufreq_cooling_register+0x3a0/0x51c) [<c03c3f50>] (__cpufreq_cooling_register) from [<bf00505c>] (cpufreq_ready+0x44/0x88 [cpufreq_dt]) [<bf00505c>] (cpufreq_ready [cpufreq_dt]) from [<c03d6c30>] (cpufreq_add_dev+0x4a0/0x7dc) [<c03d6c30>] (cpufreq_add_dev) from [<c02cd3ec>] (subsys_interface_register+0x94/0xd8) [<c02cd3ec>] (subsys_interface_register) from [<c03d785c>] (cpufreq_register_driver+0x10c/0x1f0) [<c03d785c>] (cpufreq_register_driver) from [<bf0057d4>] (dt_cpufreq_probe+0x60/0x8c [cpufreq_dt]) [<bf0057d4>] (dt_cpufreq_probe [cpufreq_dt]) from [<c02d03e4>] (platform_drv_probe+0x44/0xa4) [<c02d03e4>] (platform_drv_probe) from [<c02cead8>] (driver_probe_device+0x174/0x2b4) [<c02cead8>] (driver_probe_device) from [<c02ceca4>] (__driver_attach+0x8c/0x90) [<c02ceca4>] (__driver_attach) from [<c02cd078>] (bus_for_each_dev+0x68/0x9c) [<c02cd078>] (bus_for_each_dev) from [<c02ce2f0>] (bus_add_driver+0x19c/0x214) [<c02ce2f0>] (bus_add_driver) from [<c02cf490>] (driver_register+0x78/0xf8) [<c02cf490>] (driver_register) from [<c0009710>] (do_one_initcall+0x8c/0x1d4) [<c0009710>] (do_one_initcall) from [<c05396b0>] (do_init_module+0x5c/0x1b8) [<c05396b0>] (do_init_module) from [<c0086490>] (load_module+0xd34/0xed8) [<c0086490>] (load_module) from [<c0086704>] (SyS_init_module+0xd0/0x120) [<c0086704>] (SyS_init_module) from [<c000f480>] (ret_fast_syscall+0x0/0x3c) ---[ end trace 3be0e7b7dc6e3c4f ]--- Fixes: db91651 ("thermal: export weight to sysfs") Acked-by: Javi Merino <[email protected]> Signed-off-by: Viresh Kumar <[email protected]> Signed-off-by: Eduardo Valentin <[email protected]>
laijs
pushed a commit
to laijs/linux
that referenced
this pull request
Feb 13, 2017
lkl: add read_persistent_clock() to override clock value
fengguang
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Jun 1, 2017
split __bpf_prog_run() interpreter into stack allocation and execution parts. The code section shrinks which helps interpreter performance in some cases. text data bss dec hex filename 26350 10328 624 37302 91b6 kernel/bpf/core.o.before 25777 10328 624 36729 8f79 kernel/bpf/core.o.after Very short programs got slower (due to extra function call): Before: test_bpf: torvalds#89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS test_bpf: torvalds#90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS test_bpf: torvalds#91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS test_bpf: torvalds#92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS test_bpf: torvalds#93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS After: test_bpf: torvalds#89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS test_bpf: torvalds#90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS test_bpf: torvalds#91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS test_bpf: torvalds#92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS test_bpf: torvalds#93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS Longer programs got faster: Before: test_bpf: torvalds#266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS test_bpf: torvalds#267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS test_bpf: torvalds#268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS test_bpf: torvalds#269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS test_bpf: torvalds#270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS test_bpf: torvalds#271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS test_bpf: torvalds#272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS test_bpf: torvalds#273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS After: test_bpf: torvalds#266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS test_bpf: torvalds#267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS test_bpf: torvalds#268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS test_bpf: torvalds#269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS test_bpf: torvalds#270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS test_bpf: torvalds#271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS test_bpf: torvalds#272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS test_bpf: torvalds#273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS For real world production programs the difference is noise. This patch is first step towards reducing interpreter stack consumption. Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
fengguang
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Aug 3, 2017
When DLPAR adding or removing memory we need to check the device offline status before trying to online/offline the memory. This is needed because calls device_online() and device_offline() will return non-zero for memory that is already online and offline respectively. This update resolves two scenarios. First, for kernel built with auto-online memory enabled, memory will be onlined as part of calls to add_memory(). After adding the memory the pseries dlpar code tries to online it and fails since the memory is already online. The dlpar code then tries to remove the memory which produces the oops message below because the memory is not offline. The second scenario occurs when removing memory that is already offline, i.e. marking memory offline (via sysfs) and the trying to remove that memory. This doesn't work because offlining the already offline memory does not succeed and the dlpar code then fails the dlpar remove operation. The fix for both scenarios is to check the device.offline status before making the calls to device_online() or device_offline(). kernel BUG at mm/memory_hotplug.c:2189! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 0 PID: 5 Comm: kworker/u129:0 Not tainted 4.12.0-rc3 torvalds#272 Workqueue: pseries hotplug workque .pseries_hp_work_fn task: c0000003f9c89200 task.stack: c0000003f9d10000 NIP: c0000000002ca428 LR: c0000000002ca3cc CTR: c000000000ba16a0 REGS: c0000003f9d13630 TRAP: 0700 Not tainted (4.12.0-rc3) MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 22002024 XER: 0000000a CFAR: c0000000002ca3d0 SOFTE: 1 GPR00: c0000000002ca3cc c0000003f9d138b0 c000000001bb0200 0000000000000001 GPR04: c0000003fb143c80 c0000003fef21630 0000000000000003 0000000000000002 GPR08: 0000000000000003 0000000000000003 0000000000000003 00000000000031b1 GPR12: 0000000028002042 c00000000fd80000 c000000000118ae0 c0000003fb170180 GPR16: 0000000000000000 0000000000000004 0000000000000010 c0000003ffff79c8 GPR20: c0000003ffff7b68 c0000003f728ff84 0000000000000002 0000000000000010 GPR24: 0000000000000002 c0000003f728ff80 0000000000000002 0000000000000001 GPR28: c0000003fb143c38 0000000000000002 0000000010000000 0000000020000000 NIP [c0000000002ca428] .remove_memory+0xb8/0xc0 LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0 Call Trace: [c0000003f9d138b0] [c0000000002ca3cc] .remove_memory+0x5c/0xc0 (unreliable) [c0000003f9d13940] [c0000000000938a4] .dlpar_add_lmb+0x384/0x400 [c0000003f9d13a30] [c00000000009456c] .dlpar_memory+0x5dc/0xca0 [c0000003f9d13af0] [c00000000008ce84] .handle_dlpar_errorlog+0x74/0xe0 [c0000003f9d13b70] [c00000000008cf1c] .pseries_hp_work_fn+0x2c/0x90 [c0000003f9d13bf0] [c000000000110a5c] .process_one_work+0x17c/0x460 [c0000003f9d13c90] [c000000000110dc8] .worker_thread+0x88/0x500 [c0000003f9d13d70] [c000000000118c3c] .kthread+0x15c/0x1a0 [c0000003f9d13e30] [c00000000000ba18] .ret_from_kernel_thread+0x58/0xc0 Instruction dump: 7fe3fb78 4bd7c845 60000000 7fa3eb78 4bfdd3c9 38210090 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4bfdc2ac <0fe00000> 00000000 7c0802a6 fb01ffc0 Fixes: 943db62 ("powerpc/pseries: Revert 'Auto-online hotplugged memory'") Signed-off-by: Nathan Fontenot <[email protected]>
torvalds
pushed a commit
that referenced
this pull request
Feb 23, 2018
I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
Noltari
pushed a commit
to Noltari/linux
that referenced
this pull request
Mar 11, 2018
[ upstream commit 16338a9 ] I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // torvalds#32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // torvalds#272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // torvalds#32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // torvalds#272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w
referenced
this pull request
in frank-w/BPI-Router-Linux
Mar 11, 2018
[ upstream commit 16338a9 ] I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig
referenced
this pull request
in zen-kernel/zen-kernel
Mar 11, 2018
[ upstream commit 16338a9 ] I recently noticed a crash on arm64 when feeding a bogus index into BPF tail call helper. The crash would not occur when the interpreter is used, but only in case of JIT. Output looks as follows: [ 347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510 [...] [ 347.043065] [fffb850e96492510] address between user and kernel address ranges [ 347.050205] Internal error: Oops: 96000004 [#1] SMP [...] [ 347.190829] x13: 0000000000000000 x12: 0000000000000000 [ 347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10 [ 347.201427] x9 : 0000000000000000 x8 : 0000000000000000 [ 347.206726] x7 : 0000000000000000 x6 : 001c991738000000 [ 347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a [ 347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500 [ 347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500 [ 347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61) [ 347.235221] Call trace: [ 347.237656] 0xffff000002f3a4fc [ 347.240784] bpf_test_run+0x78/0xf8 [ 347.244260] bpf_prog_test_run_skb+0x148/0x230 [ 347.248694] SyS_bpf+0x77c/0x1110 [ 347.251999] el0_svc_naked+0x30/0x34 [ 347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b) [...] In this case the index used in BPF r3 is the same as in r1 at the time of the call, meaning we fed a pointer as index; here, it had the value 0xffff808fd7cf0500 which sits in x2. While I found tail calls to be working in general (also for hitting the error cases), I noticed the following in the code emission: # bpftool p d j i 988 [...] 38: ldr w10, [x1,x10] 3c: cmp w2, w10 40: b.ge 0x000000000000007c <-- signed cmp 44: mov x10, #0x20 // #32 48: cmp x26, x10 4c: b.gt 0x000000000000007c 50: add x26, x26, #0x1 54: mov x10, #0x110 // #272 58: add x10, x1, x10 5c: lsl x11, x2, #3 60: ldr x11, [x10,x11] <-- faulting insn (f86b694b) 64: cbz x11, 0x000000000000007c [...] Meaning, the tests passed because commit ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") was using signed compares instead of unsigned which as a result had the test wrongly passing. Change this but also the tail call count test both into unsigned and cap the index as u32. Latter we did as well in 90caccd ("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here, too. Tested on HiSilicon Hi1616. Result after patch: # bpftool p d j i 268 [...] 38: ldr w10, [x1,x10] 3c: add w2, w2, #0x0 40: cmp w2, w10 44: b.cs 0x0000000000000080 48: mov x10, #0x20 // #32 4c: cmp x26, x10 50: b.hi 0x0000000000000080 54: add x26, x26, #0x1 58: mov x10, #0x110 // #272 5c: add x10, x1, x10 60: lsl x11, x2, #3 64: ldr x11, [x10,x11] 68: cbz x11, 0x0000000000000080 [...] Fixes: ddb5599 ("arm64: bpf: implement bpf_tail_call() helper") Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
nbdd0121
pushed a commit
to nbdd0121/linux
that referenced
this pull request
May 17, 2021
rust: Run rustdoc for the target triple
fengguang
pushed a commit
to 0day-ci/linux
that referenced
this pull request
Aug 11, 2021
…CKOPT Add verifier ctx test to call bpf_get_netns_cookie from cgroup/setsockopt. torvalds#269/p pass ctx or null check, 1: ctx Did not run the program (not supported) OK torvalds#270/p pass ctx or null check, 2: null Did not run the program (not supported) OK torvalds#271/p pass ctx or null check, 3: 1 OK torvalds#272/p pass ctx or null check, 4: ctx - const OK torvalds#273/p pass ctx or null check, 5: null (connect) Did not run the program (not supported) OK torvalds#274/p pass ctx or null check, 6: null (bind) Did not run the program (not supported) OK torvalds#275/p pass ctx or null check, 7: ctx (bind) Did not run the program (not supported) OK torvalds#276/p pass ctx or null check, 8: null (bind) OK torvalds#277/p pass ctx or null check, 9: ctx (cgroup/setsockopt) Did not run the program (not supported) OK torvalds#278/p pass ctx or null check, 10: null (cgroup/setsockopt) Did not run the program (not supported) OK Signed-off-by: Stanislav Fomichev <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Nov 27, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.
The performance benchmarks with Generic Entry patch[1] with audit on
from perf bench basic syscall on kunpeng920 gives roughly a 1%
performance uplift and also aligns the implementation with
x86 and RISC-V.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
cynthia0928
pushed a commit
to andestech/linux
that referenced
this pull request
Dec 1, 2025
…ice_prep_slave_sg() (torvalds#272) Badly aligned test cases in mmc_test failed because the DMA transfer could not complete. This was caused by an incorrect SrcBurstSize setting. This patch fixes the issue by calculating the required burst size in bytes based on the product of dma_slave_config's maxburst and addr_width. Then, depending on the configured SrcWidth, a suitable SrcBurstSize value is selected to ensure proper burst transfers by the ATCDMAC300 DMA controller. bugzilla: http://e-andes.andestech.com/bugzilla5/show_bug.cgi?id=33198 Reviewed-on: https://gitea.andestech.com/RD-SW/linux/pulls/272 Reviewed-by: Locus Wei-Han Chen <[email protected]> Reviewed-by: randolph <[email protected]> Reviewed-by: Tim Shih-Ting OuYang <[email protected]> Co-authored-by: CL Wang <[email protected]> Co-committed-by: CL Wang <[email protected]>
intel-lab-lkp
pushed a commit
to intel-lab-lkp/linux
that referenced
this pull request
Dec 1, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs
but rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments()
and syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance
benchmarks from perf bench basic syscall on kunpeng920 gives roughly
a 1% performance uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
[1]: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jinjie Ruan <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 17, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 19, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 20, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 21, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0
pushed a commit
to ioworker0/linux
that referenced
this pull request
Dec 23, 2025
Do not use memcpy() to extract syscall arguments from struct pt_regs but
rather just perform direct assignments.
Update syscall_set_arguments() too to keep syscall_get_arguments() and
syscall_set_arguments() in sync.
With Generic Entry patch[1] and turn on audit, the performance benchmarks
from perf bench basic syscall on kunpeng920 gives roughly a 1% performance
uplift.
| Metric | W/O this patch | With this patch | Change |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% |
| usecs/op | 0.224157 | 0.221146 | ↓1.36% |
| ops/sec | 4,461,157 | 4,501,409 | ↑0.9% |
Disassembly shows that using direct assignment causes
syscall_set_arguments() to be inlined and cuts the instruction count by
five or six compared to memcpy(). Because __audit_syscall_entry() only
uses four syscall arguments, the compiler has also elided the copy of
regs->regs[4] and regs->regs[5].
Before:
<syscall_get_arguments.constprop.0>:
aa0103e2 mov x2, x1
91002003 add x3, x0, #0x8
f9408804 ldr x4, [x0, torvalds#272]
f8008444 str x4, [x2], torvalds#8
a9409404 ldp x4, x5, [x0, torvalds#8]
a9009424 stp x4, x5, [x1, torvalds#8]
a9418400 ldp x0, x1, [x0, torvalds#24]
a9010440 stp x0, x1, [x2, torvalds#16]
f9401060 ldr x0, [x3, torvalds#32]
f9001040 str x0, [x2, torvalds#32]
d65f03c0 ret
d503201f nop
After:
a9408e82 ldp x2, x3, [x20, torvalds#8]
2a1603e0 mov w0, w22
f9400e84 ldr x4, [x20, torvalds#24]
f9408a81 ldr x1, [x20, torvalds#272]
9401c4ba bl ffff800080215ca8 <__audit_syscall_entry>
This also aligns the implementation with x86 and RISC-V.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/ [1]
Signed-off-by: Jinjie Ruan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Charlie Jenkins <[email protected]>
Cc: Christian Zankel <[email protected]>
Cc: "Dmitry V. Levin" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Maciej W. Rozycki <[email protected]>
Cc: Marc Rutland <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Russell King (Oracle) <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.