-
Notifications
You must be signed in to change notification settings - Fork 236
Implement zero-copy parsing #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/// The buffer. | ||
pub buf: String, | ||
pub buf: Tendril, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't pos
redundant with buf
now, since Iobuf has its own cursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the owned variant of a Tendril
doesn't.
Eventually I would like to replace BufferQueue
with a rope, as discussed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
Doesn't this mean that any long string of text (i.e. that spans multiple chunks) would have to fall back to a I guess that's a small price to pay for more efficient iteration. |
|
||
#[inline(always)] | ||
fn check_len(&self) { | ||
if self.len() > TENDRIL_MAX_LEN as usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use len32
?
Iobuf usage looks good to me (modulo comments)! I like this remix. |
Now #141. |
This is based on #60 but with substantial changes. The biggest difference is that we only use shared buffers for the character runs found by
pop_except_from
. The majority of the remaining spans are single ASCII characters, which have their own fast path. Everything else is aString
as before.This branch also drops many of the micro-optimizations from #60. Unlike that PR, we leave the parser rules alone for the most part.
r? @Manishearth or @SimonSapin (general review)
r? @cgaebel (iobuf usage in
tendril.rs
)Depending on the specific content and the I/O chunk size, this branch speeds up tokenization by up to a few percent. I did not see any significant performance regressions with sensible chunk sizes.
I have plans for further optimizations, including following up on the rustc bugs @cgaebel identified in #60.
The branch already achieves a significant drop in allocations and memory consumption:
(preliminary numbers)
Webapp spec, single page:
pre-zerocopy
zerocopy
Wikipedia (GotG from servo-static-suite)
pre-zerocopy
zerocopy