Skip to content

actually checked bitshifts #570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lolbinarycat opened this issue Apr 1, 2025 · 19 comments
Open

actually checked bitshifts #570

lolbinarycat opened this issue Apr 1, 2025 · 19 comments
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@lolbinarycat
Copy link

lolbinarycat commented Apr 1, 2025

Proposal

Problem statement

For every integer operation with a symbolic operator, there is a corresponding checked_* method that returns None if an overflow occurs...

except for bitshifts, whose checked_* functions do something entirely different.

For example, 255_u8.checked_shl(1) evaluates to Some(254).

The closest existing functions are overflowing_{shl,shr}, but those must also return a result in the overflowing case, ruling out implementations that check for overflow before doing the shift. EDIT: the overflowing_* functions ALSO don't do what you would expect.

Motivating examples or use cases

This could be used to detect overflow when parsing certain types of positional integer notation, from ascii decimal to certain VLQs.

Solution Sketch

  • (optional) Deprecate checked_shr and checked_shl and give them less confusing names.
  • Add new shift functions that return None on overflow

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@lolbinarycat lolbinarycat added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Apr 1, 2025
@lolbinarycat
Copy link
Author

Here's two possible implementations:

pub fn check_shl_overflow_reshift(n: u64, shift_amount: u32) -> Option<u64> {
    let r = n << shift_amount;
    if r >> shift_amount == n {
        Some(r)
    } else {
        None
    }
}

pub fn check_shl_overflow_rotate(n: u64, shift_amount: u32) -> Option<u64> {
    let r = n.rotate_left(shift_amount);
    if r & ((shift_amount-1) as u64) == 0 {
        Some(r)
    } else {
        None
    }
}

compare on godbolt

@programmerjake
Copy link
Member

imo check_shl_overflow_reshift is not correct, you also need to check for shift_amount being too big.

@pitaj
Copy link

pitaj commented Apr 1, 2025

There are the recently stabilized unbounded_shl and unbounded_shr APIs.

We could add a checked_ version of those, but checked_unbounded_shl is a mouthful.

@lolbinarycat
Copy link
Author

@pitaj is it more of a mouthful than overflowing_sub_unsigned, or any of the other similar method names on integers?

@pitaj
Copy link

pitaj commented Apr 1, 2025

I wasn't saying we should block this proposal based on the name, just maybe think if better names exist. One alternative I thought of is bounded_shl.

Also note on implementation, you should be able to implement checked_unbounded_shl in terms of unbounded_shl and leading_zeroes.

@lolbinarycat
Copy link
Author

bounded_shl does not seem to be a technically correct name to me, my understanding is that normal shl is considered "bounded" (ie. a large RHS will cause an overflow and possibly a panic)

@kennytm
Copy link
Member

kennytm commented Apr 2, 2025

What OP wanted is not any existing Rust stdlib variant of bit shift, which shifting bits beyond out of MSB is a well-defined and valid behavior, but a weird version like x.checked_mul(1 << b) for "shl".

(Not sure what the "weird_checked" version of "shr" looks like.)

@pitaj
Copy link

pitaj commented Apr 2, 2025

Yeah the shl behavior is pretty intuitive, but when should shr return None?

@programmerjake
Copy link
Member

programmerjake commented Apr 2, 2025

maybe name it a.checked_mul_by_two_pow(b) == a.checked_mul(<int>::checked_pow(2, b)?)

@pitaj
Copy link

pitaj commented Apr 2, 2025

What OP wanted is not any existing Rust stdlib variant of bit shift, which shifting bits beyond out of MSB is a well-defined and valid behavior, but a weird version like x.checked_mul(1 << b) for "shl".

unbounded_shl is essentially the wrapping version of this new kind of shift operation. Which brings up another question: do we want saturating or strict versions of unbounded_shl too?

@programmerjake
Copy link
Member

programmerjake commented Apr 2, 2025

unbounded_shl is essentially the wrapping version of this new kind of shift operation. Which brings up another question: do we want saturating or strict versions of unbounded_shl too?

strict_unbounded_shl is unbounded_shl -- it never panics. (edit: nvm, had my brain on backwards) I could get behind adding checked_unbounded_shl and saturating_unbounded_shl

though if we use checked_unbounded_shl as the name, it may be less efficient since it'd have to return Some(0) for 0.checked_unbounded_shl(12345) even though that's waay out of the supported range for shifting any non-zero value.

@lolbinarycat
Copy link
Author

@kennytm It's the reverse, where shifting bits off the right rerurns None.

basically, if the result would have less bits set than the input, None is returned.

@scottmcm
Copy link
Member

scottmcm commented Apr 2, 2025

Which brings up another question: do we want saturating or strict versions of unbounded_shl too?

I'd say no. To me, the point of it is that a large shift isn't an error, so it doesn't need to be checked in debug (and checking would be a breaking change, arguably) and doesn't need all the extra variants as a result.


If you want to be inspired by LLVM and its unsafe versions of shifts, they have shl n[su]w and [al]shr exact for the "this is reversible" cases. Basically, shl nuw %a, 3 is the one that's the same as mul nuw %a, 8.

@jdahlstrom
Copy link

jdahlstrom commented Apr 2, 2025

basically, if the result would have less bits set than the input, None is returned.

So essentially "n.exact_div(1 << k)" – shift if and only if n is divisible by k, or equivalently, iff n >> k << k == n. And similarly for shr: shift iff n << k >> k == n.

My intuition is that shifts are supposed to treat zeros and ones identically, so shifting out ones (or zeros, in the case of signed types) from either end is not an "overflow" per se but rather the expected semantics – rather "overflow" is defined as trying to shift a number of bits greater than the width of the type of the LHS, due to the way common hardware works. If we had exact_div, then these could be done with checked_mul and exact_div – maybe the use cases are uncommon enough that those would suffice?

@lolbinarycat
Copy link
Author

the points is that there are some usecases, such as number decoding and when using shifts to multiply by a power of two, where losing data in this way is undesirable.

maybe name it lossless_shl?

@hanna-kruppe
Copy link

hanna-kruppe commented Apr 2, 2025

The motivating use cases are not really clear to me. It's not even clear to me what semantics an "actually checked" right shift should have. Yes, the meaning of overflow checks on shifts (and hence {wrapping,checked}_sh{l,r}) is different from other arithmetic operators in somewhat confusing ways. But it's also not obvious to me that there's a really useful different meaning that's worth the confusion and decision paralysis of adding more options (that are even harder to name well).

For left shifts, the analogy to "multiplication by power of two" at least suggests a clear meaning: lhs.actually_checked_shl(amt) == 2.checked_pow(amt).and_then(|pow2| pow2.checked_mul(lhs)), at least for non-negative amt. Are there cases where one would not just write it as multiplication to begin with? I suppose there are cases where the shift amount isn't constant and is easier to keep in log2 representation the entire time. But how common is combination of (1) want to write shift rather than multiply and (2) must explicitly check for overflow in the multiplication sense and (3) want to check this at every single shift? For parsing numbers represented in a power-of-two base, there's often more efficient ways to check for out-of-range or invalid inputs.

For right shifts, it's more complicated because there's at least two different meanings an "actually checked" right shift could have: lossless in the sense of not shifting out any 1 bits, or analogous to division by a power of two (and thus, compatible with the story about left shifts). Neither option seems great:

  • For unsigned integers, right shift is equivalent to flooring division by a power of two. But unsigned integer division in Rust only cares about division by zero (which doesn't happen for shifts) and otherwise silently discards the remainder, so "checked shift as in checked_div" doesn't really mean anything. The proposed "not shifting out any 1 bits" semantics at least has something to check, but:
    1. It doesn't (yet?) have any precedent for division in Rust. (I think there's an ACP for some kind of "checked-exact (no remainder) division" operation.)
    2. The potential use cases are even less clear to me than for left shifts. When I "shift out" less significant bits, I generally mean it. When I divide by a power of two and manually rewrite it as a shift, I'd want have the shift to behave the same as regular division.
    3. If it's called some flavor of "checked" operation, it doubles down on the confusing state of "checked shifts" meaning something completely different from other checked arithmetic.
  • For signed right shifts, the "equivalence" with division by a power of two is not quite right to begin with for negative LHS. But insofar one considers it analogous to division, the same issue w.r.t. handling truncation differently applies. The "don't shift out any 1 bits" semantics side-steps this wobbliness, but if it's applied here it should be applied to unsigned shifts too for consistency, and so the downsides mentioned before apply here too.

@traviscross
Copy link

traviscross commented Apr 29, 2025

To elaborate on @scottmcm's comment, LLVM has the following:

  • shl nuw -- Shift left, returning poison if either 1) the shift amount is equal to or greater than the field width or 2) non-zero bits were shifted out.
  • shl nsw -- Shift left, returning poison if either 1) the shift amount is equal to or greater than the field width or 2) any bits shifted out disagree with the resulting sign bit.
  • shl nuw nsw -- Shift left, returning poison if either 1) the shift amount is equal to or greater than the field width or 2) non-zero bits were shifted out or 3) any bits shifted out disagree with the resulting sign bit.
  • lshr exact -- Shift right with zero fill, returning poison if either 1) the shift amount is equal to or greater than the field width or 2) non-zero bits were shifted out.
  • ashr exact -- Shift right with sign extension, returning poison if either 1) the shift amount is equal to or greater than the field width or 2) non-zero bits were shifted out.

nuw means "no unsigned wrap", and nsw means "no signed wrap". These are also present on add, sub, mul, trunc, and getelementptr.

exact is also present on udiv and sdiv.

If we were to do something here, it would seem worth doing what would allow for lowering cleanly to these.

@Amanieu
Copy link
Member

Amanieu commented Apr 30, 2025

We discussed this in the @rust-lang/libs-api meeting. We do think there is a strong motivation for adding these: even though it can be emulated with unchecked_mul/unchecked_exact_div, we feel that doing so is quite convoluted and would make code harder to read.

Regarding the names, we preferred calling these "exact" shifts just like exact_div:

impl iX/uX {
    // Shift succeeds if `shift < Self::BITS` and right shifting the result returns the original value.
    fn exact_shl(self, shift: u32) -> Option<Self>;
    unsafe fn exact_shl_unchecked(self, shift: u32) -> Self;

    // Shift succeeds if `shift < Self::BITS` and left shifting the result returns the original value.
    fn exact_shr(self, shift: u32) -> Option<Self>;
    unsafe fn exact_shr_unchecked(self, shift: u32) -> Self;
}

@Amanieu Amanieu added the ACP-accepted API Change Proposal is accepted (seconded with no objections) label Apr 30, 2025
@Amanieu
Copy link
Member

Amanieu commented Apr 30, 2025

Please open a tracking issue and open a PR to rust-lang/rust to add it as an unstable feature. You can close this ACP once the tracking issue has been created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ACP-accepted API Change Proposal is accepted (seconded with no objections) api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

9 participants