Reform is_power_of_two() to return false for self == 0, and move previous functionality to is_power_of_two_or_zero() #19640

aliblong · 2014-12-08T18:12:37Z

The is_power_of_two() method of the UnsignedInt trait currently returns true for self == 0. Zero is not a power of two, assuming an integral exponent k >= 0. I've therefore moved this functionality to the new method is_power_of_two_or_zero() and reformed is_power_of_two() to return false for self == 0.

To illustrate the usefulness of the existence of both functions, consider HashMap. Its capacity must be zero or a power of two; conversely, it also requires a (non-zero) power of two for key and val alignment.

Also, added a small amount of documentation regarding #18604.

Gankra · 2014-12-08T20:27:54Z

Hmm, 0u32.next_power_of_two() returns 0. Presumably if we're changing what we consider a power of two, we should change this behaviour?

r? @bjz

aliblong · 2014-12-09T05:28:18Z

I've now added in the self == 0 case for next_power_of_two() to address this.

brendanzab · 2014-12-09T12:10:42Z

Thanks for this, but unfortunately we can't add a new method to this trait without going through the RFC process first.

cc. @aturon

mahkoh · 2014-12-09T16:06:54Z

There seems to be a much more efficient implementation of next_power_of_two that does not produce 0:

pub fn next_power_of_two<T: UnsignedInt>(x: T) -> T {
    let bits = std::mem::size_of::<T>() * 8;
    let one: T = Int::one();
    one << ((bits - (x - one).leading_zeros()) % bits)
}

pub fn checked_next_power_of_two<T: UnsignedInt>(x: T) -> Option<T> {
    let npot = next_power_of_two(x);
    if npot >= x {
        Some(npot)
    } else {
        None
    }
}

#[test]
fn t() {
    assert_eq!(next_power_of_two::<u64>(0), 1);
    assert_eq!(next_power_of_two::<u64>(1), 1);
    assert_eq!(next_power_of_two::<u64>(2), 2);
    assert_eq!(next_power_of_two::<u64>(3), 4);
    assert_eq!(next_power_of_two::<u64>(4), 4);
    assert_eq!(next_power_of_two::<u64>(5), 8);
    assert_eq!(next_power_of_two::<u64>(-1), 1);
    assert_eq!(next_power_of_two::<u64>(-2), 1);
    assert_eq!(next_power_of_two::<u64>(-3), 1);
    assert_eq!(next_power_of_two::<u64>(-4), 1);
    assert_eq!(next_power_of_two::<u64>(-5), 1);
}

test b_new ... bench:   2418350 ns/iter (+/- 7130)
test b_old ... bench:   3410203 ns/iter (+/- 4517)

aliblong · 2014-12-09T17:56:30Z

The only concern I have about this overflow-unchecked implementation is that it suffers from the same downside as the original - the self == 0 and overflow cases return the same result (1), and so the burden of distinguishing them rests on the caller.

I personally wouldn't mind having overflow-checked arithmetic being the default, with unchecked being an opt-in, but that is a much broader issue.

Gankra · 2014-12-09T18:00:05Z

@aliblong there is a separate checked variant: http://doc.rust-lang.org/std/num/trait.UnsignedInt.html#tymethod.checked_next_power_of_two

aliblong · 2014-12-09T19:15:44Z

@gankro Yes, I know. I actually modified that function in the most recent commit. All I meant is that I would prefer the overflow-checked cases to be the semantic default (next_power_of_two, add) and for the non-checked cases to be opt-in (unchecked_next_power_of_two, unchecked_add).

Edit: On second thought, it's probably too cumbersome having to deal with Options from overflow checking for everyday +-*/ operations, and there's something to be said for the uniformity of having to opt into checking for ALL arithmetic.

aturon · 2014-12-15T23:28:39Z

I don't think we probably need is_power_of_two_or_zero as an extra convenience here (simpler to just write that logic yourself), but the change to the semantics of next_power_of_two seems reasonable to me.

@bjz do you agree?

sinistersnare · 2014-12-16T00:34:18Z

src/libcore/num/mod.rs

+        self.is_power_of_two_or_zero() && !(self == Int::zero())
+    }
+
+    /// Returns `true` iff `self == 2^k` for some `k` or `self == 0`.


iff -> `if`` (but it seems that this function will be removed anyways)

iff is strictly correct here (and used on e.g. the function directly above).

(To be clear, iff is a common shortening of "if and only if".)

Well, I guess I learned something new today.

aliblong · 2014-12-16T05:53:09Z

It would make me kind of sad to call some_val.is_power_of_two() || some_val == 0 when is_power_of_two() checks that self != 0, but perhaps the neatness of the API is more important than this minor arithmetic inefficiency.

aturon · 2014-12-16T05:56:51Z

I suspect that inlining and optimization would make that disappear in the
final code in any case.

On Mon, Dec 15, 2014 at 9:53 PM, Aaron Liblong [email protected]
wrote:

It would make me kind of sad to call some_val.is_power_of_two() ||
some_val == 0 when is_power_of_two() checks that self != 0, but perhaps
the neatness of the API is more important than this minor arithmetic
inefficiency.

—
Reply to this email directly or view it on GitHub
#19640 (comment).

brendanzab · 2014-12-16T10:25:02Z

I agree with @aturon. The change in semantics for next_power_of_two is reasonable, but I would leave out the extra methods.

aliblong · 2014-12-16T23:30:16Z

Okay, I removed the new method accordingly.

brendanzab · 2014-12-17T03:24:08Z

Could you squash the commits?

aliblong · 2014-12-17T03:53:42Z

All squashed up!

Gankra · 2014-12-18T04:44:09Z

build failure looks spurious. However:

@aliblong I would prefer to use @mahkoh's code since we use this functionality in our collections. Admittedly on super-slow paths (allocation), but why not use the fast one? If the user cares about overflow they can use the checked variant.

Also if mahkoh's code is used, we need to update this code accordingly: http://doc.rust-lang.org/src/collections/vec.rs.html#684-692

It should just panic (capacity overflow) in the overflow case anyway, since that means you're already asking for more than int::MAX bits, which is illegal. For the record this bad code is entirely my fault (well, and I guess partly whoever didn't stop me :P)

aliblong · 2014-12-18T04:55:46Z

Ok. Want me to push a new commit with this code, or wait for @mahkoh to submit a PR?

Gankra · 2014-12-18T04:56:24Z

Push a new commit. @mahkoh has demonstrated disinterest in submitting a PR on irc.

aliblong · 2014-12-18T05:02:33Z

Ok, will do. By the way, should I tag is_power_of_two with #[inline]? Seems like a good candidate for inlining.

Gankra · 2014-12-18T05:09:55Z

@aliblong I'm actually unclear on how default impls behave cross-crate. Might as well, we spam that directive everywhere anyway. :/

aliblong · 2014-12-19T02:45:12Z

Okay, I implemented it. I was worried about a perf hit from modifying Vec in light of #19574, but I actually got a speedup (5-10%) - attributable to the next_power_of_two speedup I suppose.

Gankra · 2014-12-19T02:55:19Z

src/libcollections/vec.rs

-                    self.grow_capacity(cap)
+                    let amort_cap = new_cap.checked_next_power_of_two();
+                    match amort_cap {
+                        None => self.grow_capacity(new_cap),


As I noted in my discussion of this method, this should really just panic, since new_cap must be larger than int::MAX. As such you can just do unwrap("capacity overflow").

Isn't new_cap guaranteed to be less than uint::MAX given that it's the product of a checked_add?

You can't allocate more than int::MAX contiguous memory because pointer arithmetic uses int. If you did successfully make such an object, you wouldn't be able to properly index in it.

Oh, okay, I had no idea. I guess that's so you can deal with negative pointer offsets?

Yeah. The underlying LLVM (OS?) API uses signed arithmetic, hands are tied.

aliblong · 2014-12-19T20:25:50Z

Finished implementing @gankro's suggestions

…nsider 0 not a power of 2. Vec panics when attempting to reserve capacity > int::MAX (uint::MAX / 2).

aliblong · 2014-12-19T23:31:58Z

Squashed the commits

The `is_power_of_two()` method of the `UnsignedInt` trait currently returns `true` for `self == 0`. Zero is not a power of two, assuming an integral exponent `k >= 0`. I've therefore moved this functionality to the new method `is_power_of_two_or_zero()` and reformed `is_power_of_two()` to return false for `self == 0`. To illustrate the usefulness of the existence of both functions, consider `HashMap`. Its capacity must be zero or a power of two; conversely, it also requires a (non-zero) power of two for key and val alignment. Also, added a small amount of documentation regarding #18604.

SimonSapin · 2014-12-20T16:46:51Z

I think this regressed performance of Vec::reserve.

const BYTES: u64 = 100_000;

#[bench]
fn push_str_one_byte(b: &mut Bencher) {
    b.bytes = BYTES;
    b.iter(|| {
        let mut s = String::new();
        for _ in range(0, BYTES) {
            s.push_str("a")
        }
        black_box(s);
    });
}

rustc 3dcd21574 2014-11-17 (Servo snapshot)
test bench::push_str_one_byte            ... bench:    558857 ns/iter (+/- 44549) = 178 MB/s

rustc 99d6956c3 2014-12-18 (Nightly binary)
test bench::push_str_one_byte            ... bench:     85366 ns/iter (+/- 4261) = 1171 MB/s

rustc 8f51ad242 2014-12-20 (make check-stage1-collections PLEASE_BENCH=1)
test string::tests::bench_push_str_one_byte                ... bench:     39258 ns/iter (+/- 10864) = 254 MB/s

SimonSapin · 2014-12-20T16:55:08Z

I don’t know if this PR is actually responsible for the results above (I haven’t bisected it), it’s just one change that seem relevant.

Gankra · 2014-12-20T17:01:55Z

Ack. We did just replace a conditional set with a conditional panic. Perhaps we should replace the second expect with an unwrap_or(UINT::MAX), and let the allocator handle it.

CC @thestinger I'm a bit shaky on whose responsibility it is to "deal" with over-large allocations. Would it be sound for all of our collections code to saturate on allocation-size overflow as a sort of "poisoning" of the allocation? I believe you're opposed to us panicking for these runtime errors anyway, right?

SimonSapin · 2014-12-20T17:09:54Z

We did just replace a conditional set with a conditional panic.

But we already had a conditional panic before… Is two worse than one?

`String::push(&mut self, ch: char)` currently has a single code path that calls `Char::encode_utf8`. Perhaps it could be faster for ASCII `char`s, which are represented as a single byte in UTF-8. This commit leaves the method unchanged, adds a copy of it with the fast path, and adds benchmarks to compare them. Results show that the fast path very significantly improves the performance of repeatedly pushing an ASCII `char`, but does not significantly affect the performance for a non-ASCII `char` (where the fast path is not taken). Output of `make check-stage1-collections NO_REBUILD=1 PLEASE_BENCH=1 TESTNAME=string::tests::bench_push` ``` test string::tests::bench_push_char_one_byte ... bench: 59552 ns/iter (+/- 2132) = 167 MB/s test string::tests::bench_push_char_one_byte_with_fast_path ... bench: 6563 ns/iter (+/- 658) = 1523 MB/s test string::tests::bench_push_char_two_bytes ... bench: 71520 ns/iter (+/- 3541) = 279 MB/s test string::tests::bench_push_char_two_bytes_with_slow_path ... bench: 71452 ns/iter (+/- 4202) = 279 MB/s test string::tests::bench_push_str ... bench: 24 ns/iter (+/- 2) test string::tests::bench_push_str_one_byte ... bench: 38910 ns/iter (+/- 2477) = 257 MB/s ``` A benchmark of pushing a one-byte-long `&str` is added for comparison, but its performance [has varied a lot lately]( rust-lang#19640 (comment)). (When the input is fixed, `s.push_str("x")` could be used instead of `s.push('x')`.)

aliblong · 2014-12-21T07:29:18Z

I can confirm that this change to reserve caused this regression. Reverting it to #19574 (but checking for amort_cap == 1 rather than amort_cap == 0 in accordance with the change to next_power_of_two) gives a 3x speedup for push_str_one_byte. No changes are seen in any of the other collections benches, as far as I can tell, which is why I didn't catch the regression in the first place.

`String::push(&mut self, ch: char)` currently has a single code path that calls `Char::encode_utf8`. This adds a fast path for ASCII `char`s, which are represented as a single byte in UTF-8. Benchmarks of stage1 libcollections at the intermediate commit show that the fast path very significantly improves the performance of repeatedly pushing an ASCII `char`, but does not significantly affect the performance for a non-ASCII `char` (where the fast path is not taken). ``` bench_push_char_one_byte 59552 ns/iter (+/- 2132) = 167 MB/s bench_push_char_one_byte_with_fast_path 6563 ns/iter (+/- 658) = 1523 MB/s bench_push_char_two_bytes 71520 ns/iter (+/- 3541) = 279 MB/s bench_push_char_two_bytes_with_slow_path 71452 ns/iter (+/- 4202) = 279 MB/s bench_push_str_one_byte 38910 ns/iter (+/- 2477) = 257 MB/s ``` A benchmark of pushing a one-byte-long `&str` is added for comparison, but its performance [has varied a lot lately](rust-lang#19640 (comment)). (When the input is fixed, `s.push_str("x")` could be used just as well as `s.push('x')`.)

brson · 2014-12-22T00:15:39Z

Appears to have been a breaking change.

Gankra · 2014-12-22T03:02:38Z

@brson oops! Thought that had been handled. 😞

sinistersnare reviewed Dec 16, 2014
View reviewed changes

aliblong force-pushed the power_of_two_reform branch from 28b3fb5 to 98f7943 Compare December 17, 2014 03:47

aliblong force-pushed the power_of_two_reform branch 2 times, most recently from 5deb72d to 4f9a282 Compare December 19, 2014 02:06

Gankra reviewed Dec 19, 2014
View reviewed changes

aliblong force-pushed the power_of_two_reform branch 3 times, most recently from fcea403 to 59701c7 Compare December 19, 2014 20:09

aliblong force-pushed the power_of_two_reform branch from 59701c7 to 0a98430 Compare December 19, 2014 23:18

Reform power_of_two methods for perf increase & semantic change to co…

f6328b6

…nsider 0 not a power of 2. Vec panics when attempting to reserve capacity > int::MAX (uint::MAX / 2).

aliblong force-pushed the power_of_two_reform branch from 0a98430 to f6328b6 Compare December 19, 2014 23:22

bors merged commit f6328b6 into rust-lang:master Dec 20, 2014

aliblong deleted the power_of_two_reform branch December 20, 2014 03:49

SimonSapin mentioned this pull request Dec 20, 2014

Add an ASCII (single-byte) fast path in String::push #20079

Merged

Reform is_power_of_two() to return false for self == 0, and move previous functionality to is_power_of_two_or_zero() #19640

Reform is_power_of_two() to return false for self == 0, and move previous functionality to is_power_of_two_or_zero() #19640

Conversation

aliblong commented Dec 8, 2014

Gankra commented Dec 8, 2014

aliblong commented Dec 9, 2014

brendanzab commented Dec 9, 2014

mahkoh commented Dec 9, 2014

aliblong commented Dec 9, 2014

Gankra commented Dec 9, 2014

aliblong commented Dec 9, 2014

aturon commented Dec 15, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aliblong commented Dec 16, 2014

aturon commented Dec 16, 2014

brendanzab commented Dec 16, 2014

aliblong commented Dec 16, 2014

brendanzab commented Dec 17, 2014

aliblong commented Dec 17, 2014

Gankra commented Dec 18, 2014

aliblong commented Dec 18, 2014

Gankra commented Dec 18, 2014

aliblong commented Dec 18, 2014

Gankra commented Dec 18, 2014

aliblong commented Dec 19, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aliblong commented Dec 19, 2014

aliblong commented Dec 19, 2014

SimonSapin commented Dec 20, 2014

SimonSapin commented Dec 20, 2014

Gankra commented Dec 20, 2014

SimonSapin commented Dec 20, 2014

aliblong commented Dec 21, 2014

brson commented Dec 22, 2014

Gankra commented Dec 22, 2014