Skip to content

Reform is_power_of_two() to return false for self == 0, and move previous functionality to is_power_of_two_or_zero() #19640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 20, 2014

Conversation

aliblong
Copy link
Contributor

@aliblong aliblong commented Dec 8, 2014

The is_power_of_two() method of the UnsignedInt trait currently returns true for self == 0. Zero is not a power of two, assuming an integral exponent k >= 0. I've therefore moved this functionality to the new method is_power_of_two_or_zero() and reformed is_power_of_two() to return false for self == 0.

To illustrate the usefulness of the existence of both functions, consider HashMap. Its capacity must be zero or a power of two; conversely, it also requires a (non-zero) power of two for key and val alignment.

Also, added a small amount of documentation regarding #18604.

@Gankra
Copy link
Contributor

Gankra commented Dec 8, 2014

Hmm, 0u32.next_power_of_two() returns 0. Presumably if we're changing what we consider a power of two, we should change this behaviour?

r? @bjz

@aliblong
Copy link
Contributor Author

aliblong commented Dec 9, 2014

I've now added in the self == 0 case for next_power_of_two() to address this.

@brendanzab
Copy link
Member

Thanks for this, but unfortunately we can't add a new method to this trait without going through the RFC process first.

cc. @aturon

@mahkoh
Copy link
Contributor

mahkoh commented Dec 9, 2014

There seems to be a much more efficient implementation of next_power_of_two that does not produce 0:

pub fn next_power_of_two<T: UnsignedInt>(x: T) -> T {
    let bits = std::mem::size_of::<T>() * 8;
    let one: T = Int::one();
    one << ((bits - (x - one).leading_zeros()) % bits)
}

pub fn checked_next_power_of_two<T: UnsignedInt>(x: T) -> Option<T> {
    let npot = next_power_of_two(x);
    if npot >= x {
        Some(npot)
    } else {
        None
    }
}

#[test]
fn t() {
    assert_eq!(next_power_of_two::<u64>(0), 1);
    assert_eq!(next_power_of_two::<u64>(1), 1);
    assert_eq!(next_power_of_two::<u64>(2), 2);
    assert_eq!(next_power_of_two::<u64>(3), 4);
    assert_eq!(next_power_of_two::<u64>(4), 4);
    assert_eq!(next_power_of_two::<u64>(5), 8);
    assert_eq!(next_power_of_two::<u64>(-1), 1);
    assert_eq!(next_power_of_two::<u64>(-2), 1);
    assert_eq!(next_power_of_two::<u64>(-3), 1);
    assert_eq!(next_power_of_two::<u64>(-4), 1);
    assert_eq!(next_power_of_two::<u64>(-5), 1);
}
test b_new ... bench:   2418350 ns/iter (+/- 7130)
test b_old ... bench:   3410203 ns/iter (+/- 4517)

@aliblong
Copy link
Contributor Author

aliblong commented Dec 9, 2014

The only concern I have about this overflow-unchecked implementation is that it suffers from the same downside as the original - the self == 0 and overflow cases return the same result (1), and so the burden of distinguishing them rests on the caller.

I personally wouldn't mind having overflow-checked arithmetic being the default, with unchecked being an opt-in, but that is a much broader issue.

@Gankra
Copy link
Contributor

Gankra commented Dec 9, 2014

@aliblong
Copy link
Contributor Author

aliblong commented Dec 9, 2014

@gankro Yes, I know. I actually modified that function in the most recent commit. All I meant is that I would prefer the overflow-checked cases to be the semantic default (next_power_of_two, add) and for the non-checked cases to be opt-in (unchecked_next_power_of_two, unchecked_add).

Edit: On second thought, it's probably too cumbersome having to deal with Options from overflow checking for everyday +-*/ operations, and there's something to be said for the uniformity of having to opt into checking for ALL arithmetic.

@aturon
Copy link
Member

aturon commented Dec 15, 2014

I don't think we probably need is_power_of_two_or_zero as an extra convenience here (simpler to just write that logic yourself), but the change to the semantics of next_power_of_two seems reasonable to me.

@bjz do you agree?

self.is_power_of_two_or_zero() && !(self == Int::zero())
}

/// Returns `true` iff `self == 2^k` for some `k` or `self == 0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff -> `if`` (but it seems that this function will be removed anyways)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iff is strictly correct here (and used on e.g. the function directly above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(To be clear, iff is a common shortening of "if and only if".)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess I learned something new today.

@aliblong
Copy link
Contributor Author

It would make me kind of sad to call some_val.is_power_of_two() || some_val == 0 when is_power_of_two() checks that self != 0, but perhaps the neatness of the API is more important than this minor arithmetic inefficiency.

@aturon
Copy link
Member

aturon commented Dec 16, 2014

I suspect that inlining and optimization would make that disappear in the
final code in any case.

On Mon, Dec 15, 2014 at 9:53 PM, Aaron Liblong [email protected]
wrote:

It would make me kind of sad to call some_val.is_power_of_two() ||
some_val == 0 when is_power_of_two() checks that self != 0, but perhaps
the neatness of the API is more important than this minor arithmetic
inefficiency.


Reply to this email directly or view it on GitHub
#19640 (comment).

@brendanzab
Copy link
Member

I agree with @aturon. The change in semantics for next_power_of_two is reasonable, but I would leave out the extra methods.

@aliblong
Copy link
Contributor Author

Okay, I removed the new method accordingly.

@brendanzab
Copy link
Member

Could you squash the commits?

@aliblong aliblong force-pushed the power_of_two_reform branch from 28b3fb5 to 98f7943 Compare December 17, 2014 03:47
@aliblong
Copy link
Contributor Author

All squashed up!

@Gankra
Copy link
Contributor

Gankra commented Dec 18, 2014

build failure looks spurious. However:

@aliblong I would prefer to use @mahkoh's code since we use this functionality in our collections. Admittedly on super-slow paths (allocation), but why not use the fast one? If the user cares about overflow they can use the checked variant.

Also if mahkoh's code is used, we need to update this code accordingly: http://doc.rust-lang.org/src/collections/vec.rs.html#684-692

It should just panic (capacity overflow) in the overflow case anyway, since that means you're already asking for more than int::MAX bits, which is illegal. For the record this bad code is entirely my fault (well, and I guess partly whoever didn't stop me :P)

@aliblong
Copy link
Contributor Author

Ok. Want me to push a new commit with this code, or wait for @mahkoh to submit a PR?

@Gankra
Copy link
Contributor

Gankra commented Dec 18, 2014

Push a new commit. @mahkoh has demonstrated disinterest in submitting a PR on irc.

@aliblong
Copy link
Contributor Author

Ok, will do. By the way, should I tag is_power_of_two with #[inline]? Seems like a good candidate for inlining.

@Gankra
Copy link
Contributor

Gankra commented Dec 18, 2014

@aliblong I'm actually unclear on how default impls behave cross-crate. Might as well, we spam that directive everywhere anyway. :/

@aliblong aliblong force-pushed the power_of_two_reform branch 2 times, most recently from 5deb72d to 4f9a282 Compare December 19, 2014 02:06
@aliblong
Copy link
Contributor Author

Okay, I implemented it. I was worried about a perf hit from modifying Vec in light of #19574, but I actually got a speedup (5-10%) - attributable to the next_power_of_two speedup I suppose.

self.grow_capacity(cap)
let amort_cap = new_cap.checked_next_power_of_two();
match amort_cap {
None => self.grow_capacity(new_cap),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I noted in my discussion of this method, this should really just panic, since new_cap must be larger than int::MAX. As such you can just do unwrap("capacity overflow").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't new_cap guaranteed to be less than uint::MAX given that it's the product of a checked_add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't allocate more than int::MAX contiguous memory because pointer arithmetic uses int. If you did successfully make such an object, you wouldn't be able to properly index in it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, I had no idea. I guess that's so you can deal with negative pointer offsets?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. The underlying LLVM (OS?) API uses signed arithmetic, hands are tied.

@aliblong aliblong force-pushed the power_of_two_reform branch 3 times, most recently from fcea403 to 59701c7 Compare December 19, 2014 20:09
@aliblong
Copy link
Contributor Author

Finished implementing @gankro's suggestions

@aliblong aliblong force-pushed the power_of_two_reform branch from 59701c7 to 0a98430 Compare December 19, 2014 23:18
…nsider 0 not a power of 2.

Vec panics when attempting to reserve capacity > int::MAX (uint::MAX / 2).
@aliblong aliblong force-pushed the power_of_two_reform branch from 0a98430 to f6328b6 Compare December 19, 2014 23:22
@aliblong
Copy link
Contributor Author

Squashed the commits

bors added a commit that referenced this pull request Dec 20, 2014
The `is_power_of_two()` method of the `UnsignedInt` trait currently returns `true` for `self == 0`. Zero is not a power of two, assuming an integral exponent `k >= 0`. I've therefore moved this functionality to the new method `is_power_of_two_or_zero()` and reformed `is_power_of_two()` to return false for `self == 0`.

To illustrate the usefulness of the existence of both functions, consider `HashMap`. Its capacity must be zero or a power of two; conversely, it also requires a (non-zero) power of two for key and val alignment.

Also, added a small amount of documentation regarding #18604.
@bors bors merged commit f6328b6 into rust-lang:master Dec 20, 2014
@aliblong aliblong deleted the power_of_two_reform branch December 20, 2014 03:49
@SimonSapin
Copy link
Contributor

I think this regressed performance of Vec::reserve.

const BYTES: u64 = 100_000;

#[bench]
fn push_str_one_byte(b: &mut Bencher) {
    b.bytes = BYTES;
    b.iter(|| {
        let mut s = String::new();
        for _ in range(0, BYTES) {
            s.push_str("a")
        }
        black_box(s);
    });
}
rustc 3dcd21574 2014-11-17 (Servo snapshot)
test bench::push_str_one_byte            ... bench:    558857 ns/iter (+/- 44549) = 178 MB/s

rustc 99d6956c3 2014-12-18 (Nightly binary)
test bench::push_str_one_byte            ... bench:     85366 ns/iter (+/- 4261) = 1171 MB/s

rustc 8f51ad242 2014-12-20 (make check-stage1-collections PLEASE_BENCH=1)
test string::tests::bench_push_str_one_byte                ... bench:     39258 ns/iter (+/- 10864) = 254 MB/s

@SimonSapin
Copy link
Contributor

I don’t know if this PR is actually responsible for the results above (I haven’t bisected it), it’s just one change that seem relevant.

@Gankra
Copy link
Contributor

Gankra commented Dec 20, 2014

Ack. We did just replace a conditional set with a conditional panic. Perhaps we should replace the second expect with an unwrap_or(UINT::MAX), and let the allocator handle it.

CC @thestinger I'm a bit shaky on whose responsibility it is to "deal" with over-large allocations. Would it be sound for all of our collections code to saturate on allocation-size overflow as a sort of "poisoning" of the allocation? I believe you're opposed to us panicking for these runtime errors anyway, right?

@SimonSapin
Copy link
Contributor

We did just replace a conditional set with a conditional panic.

But we already had a conditional panic before… Is two worse than one?

SimonSapin added a commit to SimonSapin/rust that referenced this pull request Dec 20, 2014
`String::push(&mut self, ch: char)` currently has a single code path
that calls `Char::encode_utf8`.
Perhaps it could be faster for ASCII `char`s, which are represented as
a single byte in UTF-8.

This commit leaves the method unchanged,
adds a copy of it with the fast path,
and adds benchmarks to compare them.

Results show that the fast path very significantly improves the performance
of repeatedly pushing an ASCII `char`,
but does not significantly affect the performance for a non-ASCII `char`
(where the fast path is not taken).

Output of `make check-stage1-collections NO_REBUILD=1 PLEASE_BENCH=1 TESTNAME=string::tests::bench_push`

```
test string::tests::bench_push_char_one_byte                 ... bench:     59552 ns/iter (+/- 2132) = 167 MB/s
test string::tests::bench_push_char_one_byte_with_fast_path  ... bench:      6563 ns/iter (+/- 658) = 1523 MB/s
test string::tests::bench_push_char_two_bytes                ... bench:     71520 ns/iter (+/- 3541) = 279 MB/s
test string::tests::bench_push_char_two_bytes_with_slow_path ... bench:     71452 ns/iter (+/- 4202) = 279 MB/s
test string::tests::bench_push_str                           ... bench:        24 ns/iter (+/- 2)
test string::tests::bench_push_str_one_byte                  ... bench:     38910 ns/iter (+/- 2477) = 257 MB/s
```

A benchmark of pushing a one-byte-long `&str` is added for comparison,
but its performance [has varied a lot lately](
rust-lang#19640 (comment)).
(When the input is fixed, `s.push_str("x")` could be used
instead of `s.push('x')`.)
@aliblong
Copy link
Contributor Author

I can confirm that this change to reserve caused this regression. Reverting it to #19574 (but checking for amort_cap == 1 rather than amort_cap == 0 in accordance with the change to next_power_of_two) gives a 3x speedup for push_str_one_byte. No changes are seen in any of the other collections benches, as far as I can tell, which is why I didn't catch the regression in the first place.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Dec 21, 2014
`String::push(&mut self, ch: char)` currently has a single code path that calls `Char::encode_utf8`. This adds a fast path for ASCII `char`s, which are represented as a single byte in UTF-8.

Benchmarks of stage1 libcollections at the intermediate commit show that the fast path very significantly improves the performance of repeatedly pushing an ASCII `char`, but does not significantly affect the performance for a non-ASCII `char` (where the fast path is not taken).

```
bench_push_char_one_byte                  59552 ns/iter (+/- 2132) = 167 MB/s
bench_push_char_one_byte_with_fast_path    6563 ns/iter (+/- 658) = 1523 MB/s
bench_push_char_two_bytes                 71520 ns/iter (+/- 3541) = 279 MB/s
bench_push_char_two_bytes_with_slow_path  71452 ns/iter (+/- 4202) = 279 MB/s
bench_push_str_one_byte                   38910 ns/iter (+/- 2477) = 257 MB/s
```

A benchmark of pushing a one-byte-long `&str` is added for comparison, but its performance [has varied a lot lately](rust-lang#19640 (comment)). (When the input is fixed, `s.push_str("x")` could be used just as well as `s.push('x')`.)
@brson
Copy link
Contributor

brson commented Dec 22, 2014

Appears to have been a breaking change.

@Gankra
Copy link
Contributor

Gankra commented Dec 22, 2014

@brson oops! Thought that had been handled. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants