Skip to content

Use simple io::Error for &[u8] Read and Write impl #42156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

arthurprs
Copy link
Contributor

@arthurprs arthurprs commented May 22, 2017

I'm not sure this change is gonna fly because of the worse error message (see end of post), but here's the case.

I was checking https://www.reddit.com/r/rust/comments/6cjs7u/analyzing_the_performance_of_a_hash/ and I pinpointed the cause of slowness to the io::read_exact implementation that byteorder read_u32 calls.
It turns out that the optimizer isn't smart enough to avoid the creation of the result altogether and creates a relatively costly io::Error just to discard it after (edit: reason being that creating an io::Error involves a function call and the optimizer can't blindly remove that side effect).

I played a bit with the code and if the code is simple enough (simple enum creation after the inlines) it optimizes out just as well as the version with the branch+unwrap().

Playground link for before: http://play.integer32.com/?gist=04569213693e4862ef1e13a19e408cd5
Playground link for after: http://play.integer32.com/?gist=3755fc64d339c81d8cb01ea41cc0862b

code

fn write(&mut self, mut bytes: &[u8]) {
    while let Ok(n) = bytes.read_u32::<NE>() {
        self.write_u32(n);
    }

    for byte in bytes {
        let i = *byte;
        self.add_to_hash(i as usize);
    }
}

before

cmpq    $4, %rbx
jb  .LBB0_6
....
.LBB0_6:
leaq    str.0(%rip), %rdi
movl    $27, %esi
callq   _ZN3std5error205_$LT$impl$u20$core..convert..From$LT$$RF$$u27$b$u20$str$GT$$u20$for$u20$alloc..boxed..Box$LT$std..error..Error$u20$$u2b$$u20$core..marker..Send$u20$$u2b$$u20$core..marker..Sync$u20$$u2b$$u20$$u27$a$GT$$GT$4from17hf12a25e4e8752742E@PLT
movq    %rdx, %rcx
leaq    8(%rsp), %rdi
movl    $17, %esi
movq    %rax, %rdx
callq   _ZN3std2io5error5Error4_new17h3674736498de977fE@PLT
cmpb    $2, 8(%rsp)
jb  .LBB0_11

after

cmpq	$4, %rsi
jb	.LBB0_6
...
.LBB0_6:
testq	%rsi, %rsi
je	.LBB0_7

Error message changes:

println!("display -> {}", read_u32(&mut bytes).unwrap_err());
println!("debug -> {:?}", read_u32(&mut bytes).unwrap_err());

before:
display -> failed to write whole buffer
debug -> Error { repr: Custom(Custom { kind: WriteZero, error: StringError("failed to write whole buffer") }) }

after:
display -> write zero
debug -> Error { repr: Kind(WriteZero) }

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 22, 2017
@sfackler
Copy link
Member

This seems pretty reasonable to me I think.

@rust-lang/libs Does anyone feel like the new errors will be too confusing?

@alexcrichton
Copy link
Member

Nah this seems reasonable to me, especially because they're just an implementation on slices and not for the whole trait

@nagisa
Copy link
Member

nagisa commented May 22, 2017

I’m not following the reasoning. You say:

It turns out that the optimizer isn't smart enough to avoid the creation of the result altogether and creates a relatively costly io::Error just to discard it after.

AFAICT it only creates the Error in the failure case. That is, the Error::new() will not be called if the function is not returning the error. I can see more cleanups in the old version, but the exceptions should be “free” in terms of execution speed. The stack usage between the versions is also the same.

… I guess what you describe does happen ONCE, when the actual failure case happens and the function returns the Error (the inline(never) thus 100% preventing from optimisation taking place), but then I’m doubting that this is the right way to fix the problem?

@arthurprs
Copy link
Contributor Author

arthurprs commented May 22, 2017

The sample code uses that to stop the loop. So yes, it happens at least once. Creating the common io::Error though involves a function call which the optimizer can't just remove (side effects, etc.) so stopping the loop is really expensive compared to the rest of the function.

It's probably fair to accept this optimization as it's reading over the simplest readable type in the language.

@nagisa
Copy link
Member

nagisa commented May 22, 2017

See, my point here is not about it being a bad optimisation to make, but about whether it is the right way to optimise this. Any io::Error that has Repr::Custom in it is the target of such inefficiency. This PR fixes the problem for &[u8] (great!), then somebody comes along, puts the &[u8] into Cursor<T> and complains about exactly the same problem (ewww?).

This could also be fixed in general by fixing #24194 + a targeted #[inline] on a relevant constructor function.

@arthurprs
Copy link
Contributor Author

I confirmed that it could work if that issue is fixed [1]. We would need to mark the str/String Into<Box>> inline accordingly. Can we rely on that?

[1] choose stable&release in http://play.integer32.com/?gist=dd721e339a507e560016b8baee698a60 doesn't work on nightly.

@arielb1 arielb1 added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 23, 2017
@carols10cents
Copy link
Member

I confirmed that it could work if that issue is fixed [1]. We would need to mark the str/String Into> inline accordingly. Can we rely on that?

@nagisa do you have an answer for @arthurprs's question here?

@arthurprs
Copy link
Contributor Author

To be clearer, can we rely on llvm to optimize out the error path allocations? Maybe it considers it might have side effects or something (like normal functions).

@nagisa
Copy link
Member

nagisa commented May 29, 2017

I general you can’t rely on optimisations in any way at all. If you need some particular sequence of instructions, write them down in assembly.

That being said, getting rid of the allocations here by constructing a different variant of the error enumeration and getting rid of the allocations here by a carefully placed inline hint are pretty much equivalent as far as reliability is concerned IMO.

@arthurprs
Copy link
Contributor Author

Fair enough. I suggesting using the simple io::Error then, it was created for this exact purpose.

@arthurprs
Copy link
Contributor Author

Ahh, maybe not... it's too specific. I'll probably close and we can revisit once #24194 is resolved.

@arthurprs arthurprs closed this May 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants