Skip to content

Conversation

@stefpopa
Copy link
Contributor

The AD5691R/AD5692R/AD5693/AD5693R are a family of one channel DACs with 12-bit,
14-bit and 16-bit precision respectively. The devices have either no built-in reference,
or built-in 2.5V reference.

These devices are pretty similar to AD5671R/AD5675R/AD5694/AD5694R/AD5695R/AD5696/AD5696R,
except that they have only one channel. Another difference is that they use a write control
register(addr 0x04) for setting the power down modes and the internal reference instead of
sepparate registers for each function.

Datasheet:
http://www.analog.com/media/en/technical-documentation/data-sheets/AD5693R_5692R_5691R_5693.pdf

Signed-off-by: Stefan Popa [email protected]

{
struct ad5686_state *st = iio_priv(indio_dev);
unsigned int shift = (st->chip_info->num_channels > 1) ?
(chan->channel * 2) : 13;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: indentation may need adjustment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

const struct iio_chan_spec *chan, unsigned int mode)
{
struct ad5686_state *st = iio_priv(indio_dev);
unsigned int shift = (st->chip_info->num_channels > 1) ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if using the number of channels as a differentiating parameter is the best idea [from a general design perspective];
that can be problematic in future cases, where some more DACs would need to be added and have 1 channel, and (chan->channel * 2) as shift

maybe, a new set of static int ad569x_xxx_powerdown_mode functions would be needed ;
and assigned to the new devices you are adding ;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ; maybe static int ad5693_xxx_powerdown_mode would be a better name ;

Copy link
Contributor Author

@stefpopa stefpopa Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could define a new struct iio_chan_spec_ext_info ad5693_ext_info[] with different implementations for the read_dac_powerdown and write_dac_powerdown functions and pass it to .ext_info filed of AD5868_CHANNEL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add a field to chip_info struct

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok! Thanks!


/* Set all the power down mode for all channels to 1K pulldown */
st->pwr_down_mode = 0x55;
if (st->chip_info->num_channels == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here ; i would not use the number of channels as a differentiating parameter ;
maybe a wrapper function would be needed here ;

also, i'm a bit curios ; you are adding DACs with one channel, but this code handles cases for 1, 4 & 8 channels ;
can you detail this a bit ? :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 8 channel handling should probably have been part of the patch that adds the 8 channel DACs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the 8 channel case. It should be included in a different commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to move the st->pwr_down_mode field from the ad5686_state struct to the chip_info struct ?

you could initialize it [same as the regmap info] and then use that directly via st->chip_info->pwr_down_mode

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is changed at runtime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, my bad ; i did not check in depth that;
then, would it make sense to add the initial pwr_down_mode on chip_info ?
and then we could have:
st->pwr_down_mode = st->chip_info->pwr_down_mode;
at init time ?

Copy link

@lclausen-adi lclausen-adi Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this field works is that every channel occupies 2 bits. And for each of those channels we set those 2 bits to b01 when initializing it. If you wanted something that doesn't use if (...) else if (...) you could do

st->pwr_down_mode = 0x55555555;
st->pwr_down_mode &= (1 << (indio_dev->num_channels*2)) - 1; /* Mask out unused channels */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah
thanks for the clarification :)
if (...) else if (...) should be fine in this case

Copy link

@lclausen-adi lclausen-adi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the commit message: "sepparate"

@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch from 9ac528a to 3b81b60 Compare January 25, 2018 17:21
@stefpopa
Copy link
Contributor Author

changelog V1 -> V2:

  • Fixed typo in the commit message
  • Added a different commit for initializing the power down mode for 8 channel DACs
  • Added a regmap_type field to the ad5686_chip_info struct. This field is used as a differentiating parameter instead of num_channels variable for power-down mode or internal voltage reference operations.

st->pwr_down_mask &= ~(0x3 << shift);

ret = ad5686_write(st, AD5686_CMD_POWERDOWN_DAC, 0,
st->pwr_down_mask & st->pwr_down_mode, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I'm OK with this patch. But what I'd do different is rather than having the regmap specific shift in all those different places I'd keep pwr_down_mask and pwr_down_mode as they are. And only here where we write the value out to the register apply the shift.

val = st->pwr_down_mask & st->pwr_down_mode;
if (st->chip_info->regmap_type == AD5693_REGMAP)
    val <<= 13; /* AD5693 register layout is slightly different */

The other thing you need to take care of here is to not clear the REF bit if it was set in probe. I'd keep a copy of whether REF is enabled or not in the state struct and then set val here accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I missed the fact that REF bit will be overwritten.
Regarding pwr_down_mode variable, indeed ad5686_write_dac_powerdown is the only place where we actually write a value to the register. However, shouldn't we be able to get and set the power mode independently for each channel?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what I meant was keep the code handling pwr_down_mode and pwr_down_mask as it is (with the shift of chan->channel * 2).

But only here in the write function apply the extra shift of 13 for AD5693.

Or do you mean we should also write the register in ad5686_set_powerdown_mode()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that by doing operations such as echo three_state > out_voltage0_powerdown_mode and cat out_voltage0_powerdown_mode we will only write/read the pwr_down_mode variable without reading or writing the actual register value. If we keep the shift of chan->channel * 2 won't we get wrong readings for AD5693?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about pwr_down_mode and pwr_down_mask as purely virtual storage not associated with any hardware register at all. Each channel gets two bits in these fields. channel 0 starts at offset 0, channel 1 starts at offset 2 and so on. Kind of like an array. If you do echo ... > ...powerdown_mode; cat ...power_downmode this will work fine even if there is no hardware attached to the driver at all.

Now when you want to write the register you have to translate the virtual value to the physical value that the register expects. For the AD5686 this will be trivial since both are the same and for the AD5693 the translation is still quite simple. The AD5693 only has one channel so the bits associated with that channel will be at offset 0 in pwr_down_mode. Now the register expects the bits to be at offset 13. So all you need to do is shift the virtual value by 13 to get to the physical value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it now! Thanks! :)

cmd = AD5686_CMD_CONTROL_REG;
shift = 12;
}
else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline

@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch 2 times, most recently from d3f98c6 to 078d595 Compare January 26, 2018 10:28
@stefpopa
Copy link
Contributor Author

changelog V2 -> V3:

  • initialize pwr_down_mode variable by using a for loop and running through each channel
  • make sure that the REF bit is not overwritten for devices which use the same register for setting both the powerdown mode and internal reference. This is done by checking the use_internal_vref which is set at start up.
  • regmap_type is checked only in ad5686_write_dac_powerdown before the register is written.

@stefpopa
Copy link
Contributor Author

Can you please take a look over this new commit as well? I also added support for Add AD5691R/AD5692R/AD5693/AD5693R DACs

st->pwr_down_mask & st->pwr_down_mode, 0);
if (st->chip_info->regmap_type == AD5693_REGMAP) {
val = ((st->pwr_down_mask & st->pwr_down_mode) << 13) |
(st->use_internal_vref == true ? 0 : (1 << 12));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream won't like this. Try to run coccicheck (https://bottest.wiki.kernel.org/coccicheck) on your patches it will report potential issues.

make C=2 CHECK="scripts/coccicheck" drivers/iio/dac/

Should be

val = (st->pwr_down_mask & st->pwr_down_mode) << 13;
if (!st->use_internal_vref)
   val |= BIT(12); // Even better do a #define for this flag

* @pwr_down_mask: power down mask
* @pwr_down_mode: current power down mode
* @use_internal_vref: set to true if the internal reference voltage should be
* used.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I'd say 'is used' instead of 'should be used'

AD5693_REGMAP
AD5693_REGMAP,
AD5683_REGMAP,
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep these in alphabetical order.

break;
case AD5683_REGMAP:
val = ((st->pwr_down_mask & st->pwr_down_mode) << 17) |
(st->use_internal_vref == true ? 0 : (1 << 16));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crazy hardware people always changing stuff around, just because they can...

But there is still a bit of a pattern here. The code is a copy&pasted version of the AD5693 case. Maybe you can handle this in a more generic way. Like

switch (type) {
case AD5683_REGMAP:
    shift = 12; 
    break;
case AD5693_REGMAP:
    shift = 12; 
    break;
}

if (st->chip_info->regmap_type == AD5686_REGMAP)
cmd = AD5686_CMD_READBACK_ENABLE;
else if (st->chip_info->regmap_type == AD5683_REGMAP)
cmd = AD5686_CMD_READBACK_ENABLE_V2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder, how is readback implemented on AD5693? Does reading back the DAC value work?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the part has no SPI interface, so that answers the question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But still, do we have to change the i2c read function for AD5693? Looking at the datasheet it looks different from the multi-channel chips.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read operation is similar. Also, I have tested this

Copy link

@lclausen-adi lclausen-adi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one general comment.

Instead of

if (st->chip_info->regmap_type == A_REGMAP)
   ...
else if (st->chip_info->regmap_type == B_REGMAP)
   ...
else if (st->chip_info->regmap_type == C_REGMAP)
   ...

Try to use

switch (st->chip_info->regmap_type) {
case A_REGMAP:
   ...
   break;
case B_REGMAP:
   ...
   break;
case C_REGMAP:
   ...
   break;
}

In my opinion that makes it easier to follow.

@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch 2 times, most recently from d07ea83 to cac5e63 Compare January 29, 2018 10:40
@stefpopa
Copy link
Contributor Author

changelog V3 -> V4:

  • Used switch(case) instead of if/else
  • Made ad5686_write_dac_powerdown() function more generic
  • Defined AD5693_REF_BIT_MSK and AD5683_REF_BIT_MSK
  • ad5686_regmap_type constants are defined in alphabetical order

@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch from cac5e63 to 3770577 Compare January 29, 2018 12:15
@stefpopa
Copy link
Contributor Author

Any remarks? Is it OK to merge?

@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch 3 times, most recently from 1c68579 to 5b80f63 Compare January 30, 2018 16:53
@stefpopa
Copy link
Contributor Author

changelog V4 -> V5:

  • This pull request includes only the commit which adds support for AD5691R/AD5692R/AD5693/AD5693R.
  • Support for AD5683R DACs family will be treated in a different pull request.

Changes from the first version:

  • Used switch(case) instead of if/else
  • Made ad5686_write_dac_powerdown() function more generic
  • Defined AD5693_REF_BIT_MSK
  • make sure that the REF bit is not overwritten for devices which use the same register for setting both the powerdown mode and internal reference. This is done by checking the use_internal_vref which is set at start up.
  • Added a regmap_type field to the ad5686_chip_info struct. This field is used as a differentiating parameter instead of num_channels variable for power-down mode or internal voltage reference operations.
  • regmap_type is checked only in ad5686_write_dac_powerdown before the register is written.

The AD5691R/AD5692R/AD5693/AD5693R are a family of one channel DACs with
12-bit, 14-bit and 16-bit precision respectively. The devices have either
no built-in reference, or built-in 2.5V reference.

These devices are pretty similar to AD5671R/AD5675R and
AD5694/AD5694R/AD5695R/AD5696/AD5696R, except that they have one channel.
Another difference is that they use a write control register(addr 0x04) for
setting the power down modes and the internal reference instead of separate
registers for each function.

Datasheet:
	http://www.analog.com/media/en/technical-documentation/data-sheets/AD5693R_5692R_5691R_5693.pdf

Signed-off-by: Stefan Popa <[email protected]>
@stefpopa stefpopa force-pushed the rpi-4.9.y-ad5693-support branch from 5b80f63 to 520f537 Compare January 31, 2018 08:46
@commodo commodo merged commit 718a652 into rpi-4.9.y Feb 1, 2018
@stefpopa stefpopa deleted the rpi-4.9.y-ad5693-support branch February 1, 2018 10:52
pcercuei pushed a commit that referenced this pull request Jun 21, 2023
The quota assign ioctl can currently run in parallel with a quota disable
ioctl call. The assign ioctl uses the quota root, while the disable ioctl
frees that root, and therefore we can have a use-after-free triggered in
the assign ioctl, leading to a trace like the following when KASAN is
enabled:

  [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
  [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
  [672.724][T736]
  [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
  [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  [672.727][T736] Call Trace:
  [672.728][T736]  <TASK>
  [672.728][T736]  dump_stack_lvl+0xd9/0x150
  [672.725][T736]  print_report+0xc1/0x5e0
  [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
  [672.727][T736]  ? __phys_addr+0xc9/0x150
  [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.722][T736]  kasan_report+0xc0/0xf0
  [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
  [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
  [672.722][T736]  ? split_leaf+0x13d0/0x13d0
  [672.726][T736]  ? rcu_is_watching+0x12/0xb0
  [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
  [672.722][T736]  update_qgroup_status_item+0xf7/0x320
  [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
  [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
  [672.730][T736]  ? spin_bug+0x1d0/0x1d0
  [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
  [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
  [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
  [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
  [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
  [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
  [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
  [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
  [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
  [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
  [672.732][T736]  ? sigprocmask+0xf2/0x340
  [672.737][T736]  ? __fget_files+0x26a/0x480
  [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
  [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.736][T736]  __x64_sys_ioctl+0x198/0x210
  [672.736][T736]  do_syscall_64+0x39/0xb0
  [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.739][T736] RIP: 0033:0x4556ad
  [672.742][T736]  </TASK>
  [672.743][T736]
  [672.748][T736] Allocated by task 27677:
  [672.743][T736]  kasan_save_stack+0x22/0x40
  [672.741][T736]  kasan_set_track+0x25/0x30
  [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
  [672.749][T736]  btrfs_alloc_root+0x48/0x90
  [672.746][T736]  btrfs_create_tree+0x146/0xa20
  [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
  [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
  [672.747][T736]  __x64_sys_ioctl+0x198/0x210
  [672.749][T736]  do_syscall_64+0x39/0xb0
  [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.756][T736]
  [672.757][T736] Freed by task 27677:
  [672.759][T736]  kasan_save_stack+0x22/0x40
  [672.759][T736]  kasan_set_track+0x25/0x30
  [672.756][T736]  kasan_save_free_info+0x2e/0x50
  [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
  [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
  [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
  [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
  [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
  [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
  [672.756][T736]  __x64_sys_ioctl+0x198/0x210
  [672.753][T736]  do_syscall_64+0x39/0xb0
  [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.769][T736]
  [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
  [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
  [672.769][T736] The buggy address is located 520 bytes inside of
  [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
  [672.760][T736]
  [672.764][T736] The buggy address belongs to the physical page:
  [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
  [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
  [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
  [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
  [672.771][T736] page dumped because: kasan: bad access detected
  [672.778][T736] page_owner tracks the page as allocated
  [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
  [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
  [672.779][T736]  __alloc_pages+0x1cb/0x4a0
  [672.776][T736]  alloc_pages+0x1aa/0x270
  [672.773][T736]  allocate_slab+0x260/0x390
  [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
  [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
  [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
  [672.789][T736]  __kmalloc+0x4e/0x1a0
  [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
  [672.781][T736]  tomoyo_path_perm+0x22f/0x420
  [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
  [672.780][T736]  security_path_unlink+0xdb/0x150
  [672.788][T736]  do_unlinkat+0x377/0x680
  [672.788][T736]  __x64_sys_unlink+0xca/0x110
  [672.789][T736]  do_syscall_64+0x39/0xb0
  [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.784][T736] page last free stack trace:
  [672.787][T736]  free_pcp_prepare+0x4e5/0x920
  [672.787][T736]  free_unref_page+0x1d/0x4e0
  [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
  [672.797][T736]  qlist_free_all+0x6a/0x180
  [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
  [672.797][T736]  __kasan_slab_alloc+0x64/0x90
  [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
  [672.799][T736]  getname_flags.part.0+0x50/0x4e0
  [672.799][T736]  getname_flags+0x9e/0xe0
  [672.792][T736]  vfs_fstatat+0x77/0xb0
  [672.791][T736]  __do_sys_newlstat+0x84/0x100
  [672.798][T736]  do_syscall_64+0x39/0xb0
  [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.790][T736]
  [672.791][T736] Memory state around the buggy address:
  [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]                       ^
  [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
call.

Reported-by: butt3rflyh4ck <[email protected]>
Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
CC: [email protected] # 5.10+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
btogorean pushed a commit that referenced this pull request Aug 25, 2023
[ Upstream commit 4d42ecd ]

indx_read is called when we have some NTFS directory operations that
need more information from the index buffers. This adds a sanity check
to make sure the returned index buffer length is legit, or we may have
some out-of-bound memory accesses.

[  560.897595] BUG: KASAN: slab-out-of-bounds in hdr_find_e.isra.0+0x10c/0x320
[  560.898321] Read of size 2 at addr ffff888009497238 by task exp/245
[  560.898760]
[  560.899129] CPU: 0 PID: 245 Comm: exp Not tainted 6.0.0-rc6 #37
[  560.899505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  560.900170] Call Trace:
[  560.900407]  <TASK>
[  560.900732]  dump_stack_lvl+0x49/0x63
[  560.901108]  print_report.cold+0xf5/0x689
[  560.901395]  ? hdr_find_e.isra.0+0x10c/0x320
[  560.901716]  kasan_report+0xa7/0x130
[  560.901950]  ? hdr_find_e.isra.0+0x10c/0x320
[  560.902208]  __asan_load2+0x68/0x90
[  560.902427]  hdr_find_e.isra.0+0x10c/0x320
[  560.902846]  ? cmp_uints+0xe0/0xe0
[  560.903363]  ? cmp_sdh+0x90/0x90
[  560.903883]  ? ntfs_bread_run+0x190/0x190
[  560.904196]  ? rwsem_down_read_slowpath+0x750/0x750
[  560.904969]  ? ntfs_fix_post_read+0xe0/0x130
[  560.905259]  ? __kasan_check_write+0x14/0x20
[  560.905599]  ? up_read+0x1a/0x90
[  560.905853]  ? indx_read+0x22c/0x380
[  560.906096]  indx_find+0x2ef/0x470
[  560.906352]  ? indx_find_buffer+0x2d0/0x2d0
[  560.906692]  ? __kasan_kmalloc+0x88/0xb0
[  560.906977]  dir_search_u+0x196/0x2f0
[  560.907220]  ? ntfs_nls_to_utf16+0x450/0x450
[  560.907464]  ? __kasan_check_write+0x14/0x20
[  560.907747]  ? mutex_lock+0x8f/0xe0
[  560.907970]  ? __mutex_lock_slowpath+0x20/0x20
[  560.908214]  ? kmem_cache_alloc+0x143/0x4b0
[  560.908459]  ntfs_lookup+0xe0/0x100
[  560.908788]  __lookup_slow+0x116/0x220
[  560.909050]  ? lookup_fast+0x1b0/0x1b0
[  560.909309]  ? lookup_fast+0x13f/0x1b0
[  560.909601]  walk_component+0x187/0x230
[  560.909944]  link_path_walk.part.0+0x3f0/0x660
[  560.910285]  ? handle_lookup_down+0x90/0x90
[  560.910618]  ? path_init+0x642/0x6e0
[  560.911084]  ? percpu_counter_add_batch+0x6e/0xf0
[  560.912559]  ? __alloc_file+0x114/0x170
[  560.913008]  path_openat+0x19c/0x1d10
[  560.913419]  ? getname_flags+0x73/0x2b0
[  560.913815]  ? kasan_save_stack+0x3a/0x50
[  560.914125]  ? kasan_save_stack+0x26/0x50
[  560.914542]  ? __kasan_slab_alloc+0x6d/0x90
[  560.914924]  ? kmem_cache_alloc+0x143/0x4b0
[  560.915339]  ? getname_flags+0x73/0x2b0
[  560.915647]  ? getname+0x12/0x20
[  560.916114]  ? __x64_sys_open+0x4c/0x60
[  560.916460]  ? path_lookupat.isra.0+0x230/0x230
[  560.916867]  ? __isolate_free_page+0x2e0/0x2e0
[  560.917194]  do_filp_open+0x15c/0x1f0
[  560.917448]  ? may_open_dev+0x60/0x60
[  560.917696]  ? expand_files+0xa4/0x3a0
[  560.917923]  ? __kasan_check_write+0x14/0x20
[  560.918185]  ? _raw_spin_lock+0x88/0xdb
[  560.918409]  ? _raw_spin_lock_irqsave+0x100/0x100
[  560.918783]  ? _find_next_bit+0x4a/0x130
[  560.919026]  ? _raw_spin_unlock+0x19/0x40
[  560.919276]  ? alloc_fd+0x14b/0x2d0
[  560.919635]  do_sys_openat2+0x32a/0x4b0
[  560.920035]  ? file_open_root+0x230/0x230
[  560.920336]  ? __rcu_read_unlock+0x5b/0x280
[  560.920813]  do_sys_open+0x99/0xf0
[  560.921208]  ? filp_open+0x60/0x60
[  560.921482]  ? exit_to_user_mode_prepare+0x49/0x180
[  560.921867]  __x64_sys_open+0x4c/0x60
[  560.922128]  do_syscall_64+0x3b/0x90
[  560.922369]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  560.923030] RIP: 0033:0x7f7dff2e4469
[  560.923681] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088
[  560.924451] RSP: 002b:00007ffd41a210b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000002
[  560.925168] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7dff2e4469
[  560.925655] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd41a211f0
[  560.926085] RBP: 00007ffd41a252a0 R08: 00007f7dff60fba0 R09: 00007ffd41a25388
[  560.926405] R10: 0000000000400b80 R11: 0000000000000206 R12: 00000000004004e0
[  560.926867] R13: 00007ffd41a25380 R14: 0000000000000000 R15: 0000000000000000
[  560.927241]  </TASK>
[  560.927491]
[  560.927755] Allocated by task 245:
[  560.928409]  kasan_save_stack+0x26/0x50
[  560.929271]  __kasan_kmalloc+0x88/0xb0
[  560.929778]  __kmalloc+0x192/0x320
[  560.930023]  indx_read+0x249/0x380
[  560.930224]  indx_find+0x2a2/0x470
[  560.930695]  dir_search_u+0x196/0x2f0
[  560.930892]  ntfs_lookup+0xe0/0x100
[  560.931115]  __lookup_slow+0x116/0x220
[  560.931323]  walk_component+0x187/0x230
[  560.931570]  link_path_walk.part.0+0x3f0/0x660
[  560.931791]  path_openat+0x19c/0x1d10
[  560.932008]  do_filp_open+0x15c/0x1f0
[  560.932226]  do_sys_openat2+0x32a/0x4b0
[  560.932413]  do_sys_open+0x99/0xf0
[  560.932709]  __x64_sys_open+0x4c/0x60
[  560.933417]  do_syscall_64+0x3b/0x90
[  560.933776]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  560.934235]
[  560.934486] The buggy address belongs to the object at ffff888009497000
[  560.934486]  which belongs to the cache kmalloc-512 of size 512
[  560.935239] The buggy address is located 56 bytes to the right of
[  560.935239]  512-byte region [ffff888009497000, ffff888009497200)
[  560.936153]
[  560.937326] The buggy address belongs to the physical page:
[  560.938228] page:0000000062a3dfae refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9496
[  560.939616] head:0000000062a3dfae order:1 compound_mapcount:0 compound_pincount:0
[  560.940219] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
[  560.942702] raw: 000fffffc0010200 ffffea0000164f80 dead000000000005 ffff888001041c80
[  560.943932] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000
[  560.944568] page dumped because: kasan: bad access detected
[  560.945735]
[  560.946112] Memory state around the buggy address:
[  560.946870]  ffff888009497100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  560.947242]  ffff888009497180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  560.947611] >ffff888009497200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  560.947915]                                         ^
[  560.948249]  ffff888009497280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  560.948687]  ffff888009497300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Signed-off-by: Edward Lo <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
btogorean pushed a commit that referenced this pull request Aug 25, 2023
[ Upstream commit 54e4570 ]

Though we already have some sanity checks while enumerating attributes,
resident attribute names aren't included. This patch checks the resident
attribute names are in the valid ranges.

[  259.209031] BUG: KASAN: slab-out-of-bounds in ni_create_attr_list+0x1e1/0x850
[  259.210770] Write of size 426 at addr ffff88800632f2b2 by task exp/255
[  259.211551]
[  259.212035] CPU: 0 PID: 255 Comm: exp Not tainted 6.0.0-rc6 #37
[  259.212955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[  259.214387] Call Trace:
[  259.214640]  <TASK>
[  259.214895]  dump_stack_lvl+0x49/0x63
[  259.215284]  print_report.cold+0xf5/0x689
[  259.215565]  ? kasan_poison+0x3c/0x50
[  259.215778]  ? kasan_unpoison+0x28/0x60
[  259.215991]  ? ni_create_attr_list+0x1e1/0x850
[  259.216270]  kasan_report+0xa7/0x130
[  259.216481]  ? ni_create_attr_list+0x1e1/0x850
[  259.216719]  kasan_check_range+0x15a/0x1d0
[  259.216939]  memcpy+0x3c/0x70
[  259.217136]  ni_create_attr_list+0x1e1/0x850
[  259.217945]  ? __rcu_read_unlock+0x5b/0x280
[  259.218384]  ? ni_remove_attr+0x2e0/0x2e0
[  259.218712]  ? kernel_text_address+0xcf/0xe0
[  259.219064]  ? __kernel_text_address+0x12/0x40
[  259.219434]  ? arch_stack_walk+0x9e/0xf0
[  259.219668]  ? __this_cpu_preempt_check+0x13/0x20
[  259.219904]  ? sysvec_apic_timer_interrupt+0x57/0xc0
[  259.220140]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  259.220561]  ni_ins_attr_ext+0x52c/0x5c0
[  259.220984]  ? ni_create_attr_list+0x850/0x850
[  259.221532]  ? run_deallocate+0x120/0x120
[  259.221972]  ? vfs_setxattr+0x128/0x300
[  259.222688]  ? setxattr+0x126/0x140
[  259.222921]  ? path_setxattr+0x164/0x180
[  259.223431]  ? __x64_sys_setxattr+0x6d/0x80
[  259.223828]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  259.224417]  ? mi_find_attr+0x3c/0xf0
[  259.224772]  ni_insert_attr+0x1ba/0x420
[  259.225216]  ? ni_ins_attr_ext+0x5c0/0x5c0
[  259.225504]  ? ntfs_read_ea+0x119/0x450
[  259.225775]  ni_insert_resident+0xc0/0x1c0
[  259.226316]  ? ni_insert_nonresident+0x400/0x400
[  259.227001]  ? __kasan_kmalloc+0x88/0xb0
[  259.227468]  ? __kmalloc+0x192/0x320
[  259.227773]  ntfs_set_ea+0x6bf/0xb30
[  259.228216]  ? ftrace_graph_ret_addr+0x2a/0xb0
[  259.228494]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  259.228838]  ? ntfs_read_ea+0x450/0x450
[  259.229098]  ? is_bpf_text_address+0x24/0x40
[  259.229418]  ? kernel_text_address+0xcf/0xe0
[  259.229681]  ? __kernel_text_address+0x12/0x40
[  259.229948]  ? unwind_get_return_address+0x3a/0x60
[  259.230271]  ? write_profile+0x270/0x270
[  259.230537]  ? arch_stack_walk+0x9e/0xf0
[  259.230836]  ntfs_setxattr+0x114/0x5c0
[  259.231099]  ? ntfs_set_acl_ex+0x2e0/0x2e0
[  259.231529]  ? evm_protected_xattr_common+0x6d/0x100
[  259.231817]  ? posix_xattr_acl+0x13/0x80
[  259.232073]  ? evm_protect_xattr+0x1f7/0x440
[  259.232351]  __vfs_setxattr+0xda/0x120
[  259.232635]  ? xattr_resolve_name+0x180/0x180
[  259.232912]  __vfs_setxattr_noperm+0x93/0x300
[  259.233219]  __vfs_setxattr_locked+0x141/0x160
[  259.233492]  ? kasan_poison+0x3c/0x50
[  259.233744]  vfs_setxattr+0x128/0x300
[  259.234002]  ? __vfs_setxattr_locked+0x160/0x160
[  259.234837]  do_setxattr+0xb8/0x170
[  259.235567]  ? vmemdup_user+0x53/0x90
[  259.236212]  setxattr+0x126/0x140
[  259.236491]  ? do_setxattr+0x170/0x170
[  259.236791]  ? debug_smp_processor_id+0x17/0x20
[  259.237232]  ? kasan_quarantine_put+0x57/0x180
[  259.237605]  ? putname+0x80/0xa0
[  259.237870]  ? __kasan_slab_free+0x11c/0x1b0
[  259.238234]  ? putname+0x80/0xa0
[  259.238500]  ? preempt_count_sub+0x18/0xc0
[  259.238775]  ? __mnt_want_write+0xaa/0x100
[  259.238990]  ? mnt_want_write+0x8b/0x150
[  259.239290]  path_setxattr+0x164/0x180
[  259.239605]  ? setxattr+0x140/0x140
[  259.239849]  ? debug_smp_processor_id+0x17/0x20
[  259.240174]  ? fpregs_assert_state_consistent+0x67/0x80
[  259.240411]  __x64_sys_setxattr+0x6d/0x80
[  259.240715]  do_syscall_64+0x3b/0x90
[  259.240934]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  259.241697] RIP: 0033:0x7fc6b26e4469
[  259.242647] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088
[  259.244512] RSP: 002b:00007ffc3c7841f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000bc
[  259.245086] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6b26e4469
[  259.246025] RDX: 00007ffc3c784380 RSI: 00007ffc3c7842e0 RDI: 00007ffc3c784238
[  259.246961] RBP: 00007ffc3c788410 R08: 0000000000000001 R09: 00007ffc3c7884f8
[  259.247775] R10: 000000000000007f R11: 0000000000000217 R12: 00000000004004e0
[  259.248534] R13: 00007ffc3c7884f0 R14: 0000000000000000 R15: 0000000000000000
[  259.249368]  </TASK>
[  259.249644]
[  259.249888] Allocated by task 255:
[  259.250283]  kasan_save_stack+0x26/0x50
[  259.250957]  __kasan_kmalloc+0x88/0xb0
[  259.251826]  __kmalloc+0x192/0x320
[  259.252745]  ni_create_attr_list+0x11e/0x850
[  259.253298]  ni_ins_attr_ext+0x52c/0x5c0
[  259.253685]  ni_insert_attr+0x1ba/0x420
[  259.253974]  ni_insert_resident+0xc0/0x1c0
[  259.254311]  ntfs_set_ea+0x6bf/0xb30
[  259.254629]  ntfs_setxattr+0x114/0x5c0
[  259.254859]  __vfs_setxattr+0xda/0x120
[  259.255155]  __vfs_setxattr_noperm+0x93/0x300
[  259.255445]  __vfs_setxattr_locked+0x141/0x160
[  259.255862]  vfs_setxattr+0x128/0x300
[  259.256251]  do_setxattr+0xb8/0x170
[  259.256522]  setxattr+0x126/0x140
[  259.256911]  path_setxattr+0x164/0x180
[  259.257308]  __x64_sys_setxattr+0x6d/0x80
[  259.257637]  do_syscall_64+0x3b/0x90
[  259.257970]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  259.258550]
[  259.258772] The buggy address belongs to the object at ffff88800632f000
[  259.258772]  which belongs to the cache kmalloc-1k of size 1024
[  259.260190] The buggy address is located 690 bytes inside of
[  259.260190]  1024-byte region [ffff88800632f000, ffff88800632f400)
[  259.261412]
[  259.261743] The buggy address belongs to the physical page:
[  259.262354] page:0000000081e8cac9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x632c
[  259.263722] head:0000000081e8cac9 order:2 compound_mapcount:0 compound_pincount:0
[  259.264284] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
[  259.265312] raw: 000fffffc0010200 ffffea0000060d00 dead000000000004 ffff888001041dc0
[  259.265772] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000
[  259.266305] page dumped because: kasan: bad access detected
[  259.266588]
[  259.266728] Memory state around the buggy address:
[  259.267225]  ffff88800632f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  259.267841]  ffff88800632f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  259.269111] >ffff88800632f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  259.269626]                    ^
[  259.270162]  ffff88800632f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  259.270810]  ffff88800632f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

Signed-off-by: Edward Lo <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
nunojsa pushed a commit that referenced this pull request Aug 5, 2025
[ Upstream commit eedf3e3 ]

ACPICA commit 1c28da2242783579d59767617121035dafba18c3

This was originally done in NetBSD:
NetBSD/src@b69d1ac
and is the correct alternative to the smattering of `memcpy`s I
previously contributed to this repository.

This also sidesteps the newly strict checks added in UBSAN:
llvm/llvm-project@7926744

Before this change we see the following UBSAN stack trace in Fuchsia:

  #0    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #1.2  0x000021982bc4af3c in ubsan_get_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:41 <libclang_rt.asan.so>+0x41f3c
  #1.1  0x000021982bc4af3c in maybe_print_stack_trace() compiler-rt/lib/ubsan/ubsan_diag.cpp:51 <libclang_rt.asan.so>+0x41f3c
  #1    0x000021982bc4af3c in ~scoped_report() compiler-rt/lib/ubsan/ubsan_diag.cpp:395 <libclang_rt.asan.so>+0x41f3c
  #2    0x000021982bc4bb6f in handletype_mismatch_impl() compiler-rt/lib/ubsan/ubsan_handlers.cpp:137 <libclang_rt.asan.so>+0x42b6f
  #3    0x000021982bc4b723 in __ubsan_handle_type_mismatch_v1 compiler-rt/lib/ubsan/ubsan_handlers.cpp:142 <libclang_rt.asan.so>+0x42723
  #4    0x000021afcfdeca5e in acpi_rs_get_address_common(struct acpi_resource*, union aml_resource*) ../../third_party/acpica/source/components/resources/rsaddr.c:329 <platform-bus-x86.so>+0x6aca5e
  #5    0x000021afcfdf2089 in acpi_rs_convert_aml_to_resource(struct acpi_resource*, union aml_resource*, struct acpi_rsconvert_info*) ../../third_party/acpica/source/components/resources/rsmisc.c:355 <platform-bus-x86.so>+0x6b2089
  #6    0x000021afcfded169 in acpi_rs_convert_aml_to_resources(u8*, u32, u32, u8, void**) ../../third_party/acpica/source/components/resources/rslist.c:137 <platform-bus-x86.so>+0x6ad169
  #7    0x000021afcfe2d24a in acpi_ut_walk_aml_resources(struct acpi_walk_state*, u8*, acpi_size, acpi_walk_aml_callback, void**) ../../third_party/acpica/source/components/utilities/utresrc.c:237 <platform-bus-x86.so>+0x6ed24a
  #8    0x000021afcfde66b7 in acpi_rs_create_resource_list(union acpi_operand_object*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rscreate.c:199 <platform-bus-x86.so>+0x6a66b7
  #9    0x000021afcfdf6979 in acpi_rs_get_method_data(acpi_handle, const char*, struct acpi_buffer*) ../../third_party/acpica/source/components/resources/rsutils.c:770 <platform-bus-x86.so>+0x6b6979
  #10   0x000021afcfdf708f in acpi_walk_resources(acpi_handle, char*, acpi_walk_resource_callback, void*) ../../third_party/acpica/source/components/resources/rsxface.c:731 <platform-bus-x86.so>+0x6b708f
  #11   0x000021afcfa95dcf in acpi::acpi_impl::walk_resources(acpi::acpi_impl*, acpi_handle, const char*, acpi::Acpi::resources_callable) ../../src/devices/board/lib/acpi/acpi-impl.cc:41 <platform-bus-x86.so>+0x355dcf
  #12   0x000021afcfaa8278 in acpi::device_builder::gather_resources(acpi::device_builder*, acpi::Acpi*, fidl::any_arena&, acpi::Manager*, acpi::device_builder::gather_resources_callback) ../../src/devices/board/lib/acpi/device-builder.cc:84 <platform-bus-x86.so>+0x368278
  #13   0x000021afcfbddb87 in acpi::Manager::configure_discovered_devices(acpi::Manager*) ../../src/devices/board/lib/acpi/manager.cc:75 <platform-bus-x86.so>+0x49db87
  #14   0x000021afcf99091d in publish_acpi_devices(acpi::Manager*, zx_device_t*, zx_device_t*) ../../src/devices/board/drivers/x86/acpi-nswalk.cc:95 <platform-bus-x86.so>+0x25091d
  #15   0x000021afcf9c1d4e in x86::X86::do_init(x86::X86*) ../../src/devices/board/drivers/x86/x86.cc:60 <platform-bus-x86.so>+0x281d4e
  #16   0x000021afcf9e33ad in λ(x86::X86::ddk_init::(anon class)*) ../../src/devices/board/drivers/x86/x86.cc:77 <platform-bus-x86.so>+0x2a33ad
  #17   0x000021afcf9e313e in fit::internal::target<(lambda at../../src/devices/board/drivers/x86/x86.cc:76:19), false, false, std::__2::allocator<std::byte>, void>::invoke(void*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:183 <platform-bus-x86.so>+0x2a313e
  #18   0x000021afcfbab4c7 in fit::internal::function_base<16UL, false, void(), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <platform-bus-x86.so>+0x46b4c7
  #19   0x000021afcfbab342 in fit::function_impl<16UL, false, void(), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<16UL, false, void (), std::__2::allocator<std::byte> >*) ../../sdk/lib/fit/include/lib/fit/function.h:315 <platform-bus-x86.so>+0x46b342
  #20   0x000021afcfcd98c3 in async::internal::retained_task::Handler(async_dispatcher_t*, async_task_t*, zx_status_t) ../../sdk/lib/async/task.cc:24 <platform-bus-x86.so>+0x5998c3
  #21   0x00002290f9924616 in λ(const driver_runtime::Dispatcher::post_task::(anon class)*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/dispatcher.cc:789 <libdriver_runtime.so>+0x10a616
  #22   0x00002290f9924323 in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:788:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int>::invoke(void*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0x10a323
  #23   0x00002290f9904b76 in fit::internal::function_base<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xeab76
  #24   0x00002290f9904831 in fit::callback_impl<24UL, true, void(std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request>>, int), std::__2::allocator<std::byte>>::operator()(fit::callback_impl<24UL, true, void (std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, int) ../../sdk/lib/fit/include/lib/fit/function.h:471 <libdriver_runtime.so>+0xea831
  #25   0x00002290f98d5adc in driver_runtime::callback_request::Call(driver_runtime::callback_request*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >, zx_status_t) ../../src/devices/bin/driver_runtime/callback_request.h:74 <libdriver_runtime.so>+0xbbadc
  #26   0x00002290f98e1e58 in driver_runtime::Dispatcher::dispatch_callback(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::callback_request, std::__2::default_delete<driver_runtime::callback_request> >) ../../src/devices/bin/driver_runtime/dispatcher.cc:1248 <libdriver_runtime.so>+0xc7e58
  #27   0x00002290f98e4159 in driver_runtime::Dispatcher::dispatch_callbacks(driver_runtime::Dispatcher*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:1308 <libdriver_runtime.so>+0xca159
  #28   0x00002290f9918414 in λ(const driver_runtime::Dispatcher::create_with_adder::(anon class)*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.cc:353 <libdriver_runtime.so>+0xfe414
  #29   0x00002290f991812d in fit::internal::target<(lambda at../../src/devices/bin/driver_runtime/dispatcher.cc:351:7), true, false, std::__2::allocator<std::byte>, void, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>>::invoke(void*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:128 <libdriver_runtime.so>+0xfe12d
  #30   0x00002290f9906fc7 in fit::internal::function_base<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::invoke(const fit::internal::function_base<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/internal/function.h:522 <libdriver_runtime.so>+0xecfc7
  #31   0x00002290f9906c66 in fit::function_impl<8UL, true, void(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter>>, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte>>::operator()(const fit::function_impl<8UL, true, void (std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>), std::__2::allocator<std::byte> >*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../sdk/lib/fit/include/lib/fit/function.h:315 <libdriver_runtime.so>+0xecc66
  #32   0x00002290f98e73d9 in driver_runtime::Dispatcher::event_waiter::invoke_callback(driver_runtime::Dispatcher::event_waiter*, std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, fbl::ref_ptr<driver_runtime::Dispatcher>) ../../src/devices/bin/driver_runtime/dispatcher.h:543 <libdriver_runtime.so>+0xcd3d9
  #33   0x00002290f98e700d in driver_runtime::Dispatcher::event_waiter::handle_event(std::__2::unique_ptr<driver_runtime::Dispatcher::event_waiter, std::__2::default_delete<driver_runtime::Dispatcher::event_waiter> >, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/dispatcher.cc:1442 <libdriver_runtime.so>+0xcd00d
  #34   0x00002290f9918983 in async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event(async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>*, async_dispatcher_t*, async::wait_base*, zx_status_t, zx_packet_signal_t const*) ../../src/devices/bin/driver_runtime/async_loop_owned_event_handler.h:59 <libdriver_runtime.so>+0xfe983
  #35   0x00002290f9918b9e in async::wait_method<async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>, &async_loop_owned_event_handler<driver_runtime::Dispatcher::event_waiter>::handle_event>::call_handler(async_dispatcher_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async/include/lib/async/cpp/wait.h:201 <libdriver_runtime.so>+0xfeb9e
  #36   0x00002290f99bf509 in async_loop_dispatch_wait(async_loop_t*, async_wait_t*, zx_status_t, zx_packet_signal_t const*) ../../sdk/lib/async-loop/loop.c:394 <libdriver_runtime.so>+0x1a5509
  #37   0x00002290f99b9958 in async_loop_run_once(async_loop_t*, zx_time_t) ../../sdk/lib/async-loop/loop.c:343 <libdriver_runtime.so>+0x19f958
  #38   0x00002290f99b9247 in async_loop_run(async_loop_t*, zx_time_t, _Bool) ../../sdk/lib/async-loop/loop.c:301 <libdriver_runtime.so>+0x19f247
  #39   0x00002290f99ba962 in async_loop_run_thread(void*) ../../sdk/lib/async-loop/loop.c:860 <libdriver_runtime.so>+0x1a0962
  #40   0x000041afd176ef30 in start_c11(void*) ../../zircon/third_party/ulib/musl/pthread/pthread_create.c:63 <libc.so>+0x84f30
  #41   0x000041afd18a448d in thread_trampoline(uintptr_t, uintptr_t) ../../zircon/system/ulib/runtime/thread.cc:100 <libc.so>+0x1ba48d

Link: acpica/acpica@1c28da22
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Tamir Duberstein <[email protected]>
[ rjw: Pick up the tag from Tamir ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
github-actions bot pushed a commit that referenced this pull request Aug 8, 2025
A kernel panic can be triggered by reading /proc/fs/cifs/debug_dirs.
The crash is a null-ptr-deref inside spin_lock(), caused by the use of the
uninitialized global spinlock cifs_tcp_ses_lock.

init_cifs()
 └── cifs_proc_init()
      └── // User can access /proc/fs/cifs/debug_dirs here
           └── cifs_debug_dirs_proc_show()
                └── spin_lock(&cifs_tcp_ses_lock); // Uninitialized!

KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
Mem abort info:
ESR = 0x0000000096000005
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x05: level 1 translation fault
Data abort info:
ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[dfff800000000000] address between user and kernel address ranges
Internal error: Oops: 0000000096000005 [#1] SMP
Modules linked in:
CPU: 3 UID: 0 PID: 16435 Comm: stress-ng-procf Not tainted 6.16.0-10385-g79f14b5d84c6 #37 PREEMPT
Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025
pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : do_raw_spin_lock+0x84/0x2cc
lr : _raw_spin_lock+0x24/0x34
sp : ffff8000966477e0
x29: ffff800096647860 x28: ffff800096647b88 x27: ffff0001c0c22070
x26: ffff0003eb2b60c8 x25: ffff0001c0c22018 x24: dfff800000000000
x23: ffff0000f624e000 x22: ffff0003eb2b6020 x21: ffff0000f624e768
x20: 0000000000000004 x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: ffff8000804b9600 x15: ffff700012cc8f04
x14: 1ffff00012cc8f04 x13: 0000000000000004 x12: ffffffffffffffff
x11: 1ffff00012cc8f00 x10: ffff80008d9af0d2 x9 : f3f3f304f1f1f1f1
x8 : 0000000000000000 x7 : 7365733c203e6469 x6 : 20656572743c2023
x5 : ffff0000e0ce0044 x4 : ffff80008a4deb6e x3 : ffff8000804b9718
x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
do_raw_spin_lock+0x84/0x2cc (P)
_raw_spin_lock+0x24/0x34
cifs_debug_dirs_proc_show+0x1ac/0x4c0
seq_read_iter+0x3b0/0xc28
proc_reg_read_iter+0x178/0x2a8
vfs_read+0x5f8/0x88c
ksys_read+0x120/0x210
__arm64_sys_read+0x7c/0x90
invoke_syscall+0x98/0x2b8
el0_svc_common+0x130/0x23c
do_el0_svc+0x48/0x58
el0_svc+0x40/0x140
el0t_64_sync_handler+0x84/0x12c
el0t_64_sync+0x1ac/0x1b0
Code: aa0003f3 f9000feb f2fe7e69 f8386969 (38f86908)
---[ end trace 0000000000000000 ]---

The root cause is an initialization order problem. The lock is declared
as a global variable and intended to be initialized during module startup.
However, the procfs entry that uses this lock can be accessed by userspace
before the spin_lock_init() call has run. This creates a race window where
reading the proc file will attempt to use the lock before it is
initialized, leading to the crash.

For a global lock with a static lifetime, the correct and robust approach
is to use compile-time initialization.

Fixes: 844e5c0 ("smb3 client: add way to show directory leases for improved debugging")
Signed-off-by: Yunseong Kim <[email protected]>
Signed-off-by: Steve French <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 5, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 6, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 9, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 11, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 12, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 13, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 16, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 17, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Sep 18, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
dlech pushed a commit that referenced this pull request Sep 23, 2025
Patch series "mm: remove nth_page()", v2.

As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.

To recap, the reason we currently need nth_page() within a folio is
because on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.

While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.

So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.

Likely, many people have no idea when nth_page() is required and when it
might be dropped.

We refer to such problematic PFN ranges and "non-contiguous pages".  If we
only deal with "contiguous pages", there is not need for nth_page().

Besides that "obvious" folio case, we might end up using nth_page() within
CMA allocations (again, could span memory sections), and in one corner
case (kfence) when processing memblock allocations (again, could span
memory sections).

So let's handle all that, add sanity checks, and remove nth_page().

Patch #1 -> #5   : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13  : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #22        : disallow CMA allocations of non-contiguous pages
Patch #23 -> #33 : sanity+check + remove nth_page() usage within SG entry
Patch #34        : sanity-check + remove nth_page() usage in
                   unpin_user_page_range_dirty_lock()
Patch #35        : remove nth_page() in kfence
Patch #36        : adjust stale comment regarding nth_page
Patch #37        : mm: remove nth_page()

A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.


This patch (of 37):

In an ideal world, we wouldn't have to deal with SPARSEMEM without
SPARSEMEM_VMEMMAP, but in particular for 32bit SPARSEMEM_VMEMMAP is
considered too costly and consequently not supported.

However, if an architecture does support SPARSEMEM with SPARSEMEM_VMEMMAP,
let's forbid the user to disable VMEMMAP: just like we already do for
arm64, s390 and x86.

So if SPARSEMEM_VMEMMAP is supported, don't allow to use SPARSEMEM without
SPARSEMEM_VMEMMAP.

This implies that the option to not use SPARSEMEM_VMEMMAP will now be gone
for loongarch, powerpc, riscv and sparc.  All architectures only enable
SPARSEMEM_VMEMMAP with 64bit support, so there should not really be a big
downside to using the VMEMMAP (quite the contrary).

This is a preparation for not supporting

(1) folio sizes that exceed a single memory section

(2) CMA allocations of non-contiguous page ranges

in SPARSEMEM without SPARSEMEM_VMEMMAP configs, whereby we want to limit
possible impact as much as possible (e.g., gigantic hugetlb page
allocations suddenly fails).

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-GyQ_A@mail.gmail.com/T/#u [1]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Zi Yan <[email protected]>
Acked-by: Mike Rapoport (Microsoft) <[email protected]>
Acked-by: SeongJae Park <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Lorenzo Stoakes <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Cc: Huacai Chen <[email protected]>
Cc: WANG Xuerui <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexandre Ghiti <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Alexandru Elisei <[email protected]>
Cc: Alex Dubov <[email protected]>
Cc: Alex Willamson <[email protected]>
Cc: Bart van Assche <[email protected]>
Cc: Borislav Betkov <[email protected]>
Cc: Brendan Jackman <[email protected]>
Cc: Brett Creeley <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter (Ampere) <[email protected]>
Cc: Damien Le Maol <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Dmitriy Vyukov <[email protected]>
Cc: Doug Gilbert <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jonas Lahtinen <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Lars Persson <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Marco Elver <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Maxim Levitky <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Niklas Cassel <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Robin Murohy <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Shameerali Kolothum Thodi <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Sven Schnelle <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Thomas Gleinxer <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Ulf Hansson <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yishai Hadas <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
github-actions bot pushed a commit that referenced this pull request Nov 12, 2025
MTL was broken after the vm_max_level movement. Get it back to a
working value.

[   37.722413] xe 0000:00:02.0: [drm] Tile0: GT0: VM job timed out on non-killed execqueue
[   37.722465] WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/xe/xe_guc_submit.c:1379 guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722559] Modules linked in: xt_REDIRECT nft_compat nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr sunrpc bnep snd_ctl_led snd_soc_s\
of_sdw snd_soc_intel_hda_dsp_common snd_soc_sdw_utils snd_sof_probes snd_soc_rt712_sdca regmap_sdw_mbq snd_hda_codec_intelhdmi regmap_sdw snd_soc_dmic snd_hda_intel snd_sof_pci_intel_mtl iwlmvm snd_sof_intel_hda_generic soundwire_intel snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink\
 snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp binfmt_misc snd_sof mac80211 vfat snd_sof_utils fat snd_hda_ext_core snd_hda_codec snd_hda_core snd_intel_dspcfg snd_intel_sdw_acpi snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi snd_hwdep \
crc8 soundwire_bus libarc4 snd_soc_sdca snd_soc_core
[   37.722584]  snd_compress ac97_bus uvcvideo snd_pcm_dmaengine iwlwifi snd_seq uvc videobuf2_vmalloc snd_seq_device videobuf2_memops videobuf2_v4l2 snd_pcm processor_thermal_device_pci videobuf2_common processor_thermal_device btusb intel_uncore_frequency processor_thermal_wt_hint intel_uncore_frequency_common platform_temp\
erature_control videodev btmtk spi_nor processor_thermal_soc_slider x86_pkg_temp_thermal btrtl snd_timer iTCO_wdt processor_thermal_rfim intel_powerclamp btbcm intel_pmc_bxt snd intel_rapl_msr processor_thermal_rapl coretemp iTCO_vendor_support mei_gsc_proxy btintel intel_rapl_common rapl intel_cstate cfg80211 bluetooth mc in\
tel_pmc_core mtd soundcore acer_wmi mei_me intel_uncore processor_thermal_wt_req i2c_i801 spi_intel_pci pmt_telemetry platform_profile mei processor_thermal_power_floor spi_intel i2c_smbus pmt_discovery igen6_edac pcspkr rfkill wmi_bmof idma64 processor_thermal_mbox intel_hid pmt_class int3403_thermal int3400_thermal joydev i\
nt340x_thermal_zone acpi_pad sparse_keymap
[   37.722611]  intel_pmc_ssram_telemetry acpi_thermal_rel acer_wireless loop nfnetlink zram lz4hc_compress lz4_compress dm_crypt xe drm_ttm_helper drm_suballoc_helper gpu_sched drm_gpuvm drm_exec drm_gpusvm_helper i915 nvme i2c_algo_bit nvme_core drm_buddy ucsi_acpi ttm typec_ucsi typec nvme_keyring nvme_auth hkdf drm_displa\
y_helper hid_multitouch polyval_clmulni thunderbolt intel_vpu ghash_clmulni_intel cec vmd i2c_hid_acpi video intel_vsec i2c_hid wmi pinctrl_meteorlake serio_raw i2c_dev fuse
[   37.722638] CPU: 0 UID: 0 PID: 12 Comm: kworker/u88:0 Not tainted 6.18.0-rc2+ #37 PREEMPT(voluntary)
[   37.722641] Hardware name: Acer Swift SFG14-72/Coral_MTH, BIOS V1.01 11/06/2023
[   37.722643] Workqueue: gt-ordered-wq drm_sched_job_timedout [gpu_sched]
[   37.722649] RIP: 0010:guc_exec_queue_timedout_job+0x2f3/0xe00 [xe]
[   37.722722] Code: 4c 24 10 44 89 44 24 08 e8 5a 95 f1 d4 44 8b 44 24 08 8b 4c 24 10 48 c7 c7 00 b7 25 c1 48 8b 54 24 18 48 89 c6 e8 4d 59 37 d4 <0f> 0b 80 3c 24 00 0f 85 55 03 00 00 49 8b 47 58 a8 01 75 1a 49 8b
[   37.722723] RSP: 0018:ffffd468000f7d80 EFLAGS: 00010246
[   37.722725] RAX: 0000000000000000 RBX: ffff8e3d4e215c00 RCX: 0000000000000027
[   37.722726] RDX: ffff8e40ae61cfc8 RSI: 0000000000000001 RDI: ffff8e40ae61cfc0
[   37.722727] RBP: 00000000fffffffb R08: 0000000000000000 R09: ffffd468000f7c20
[   37.722727] R10: ffff8e40c09fffa8 R11: 00000000fffbffff R12: ffff8e3d44c00028
[   37.722728] R13: ffff8e3d807d4000 R14: ffff8e3d807d4018 R15: ffff8e3d95c9d600
[   37.722729] FS:  0000000000000000(0000) GS:ffff8e4116110000(0000) knlGS:0000000000000000
[   37.722729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.722730] CR2: 00007ff1f3e02720 CR3: 0000000113c8d005 CR4: 0000000000f70ef0
[   37.722731] PKRU: 55555554
[   37.722731] Call Trace:
[   37.722734]  <TASK>
[   37.722735]  ? __pfx_autoremove_wake_function+0x10/0x10
[   37.722740]  drm_sched_job_timedout+0x81/0x170 [gpu_sched]

Fixes: 50292f9 ("drm/xe: Move 'vm_max_level' flag back to platform descriptor")
Cc: Lucas De Marchi <[email protected]>
Cc: Gustavo Sousa <[email protected]>
Cc: Matt Roper <[email protected]>
Reviewed-by: Lucas De Marchi <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Rodrigo Vivi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants