Skip to content

some low-hanging rustdoc optimizations #44613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 15, 2017

Conversation

QuietMisdreavus
Copy link
Member

There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.

@rust-highfive
Copy link
Contributor

r? @frewsxcv

(rust_highfive has picked a reviewer for you, use r? to override)

@QuietMisdreavus
Copy link
Member Author

cc @retep998 since they offered to help profile on windows - winapi was one of the inspirations for this PR

cc @bluss since they were also in the discussion and itertools was the other inspiration

try_err!(layout::redirect(&mut redirect_out, file_name), &redir_dst);
buf.clear();
try_err!(layout::redirect(&mut buf, file_name), &redir_dst);
try_err!(redirect_out.write_all(&buf), &redir_dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a BufWriter around redirect_out is what's really wanted here and below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reuses the existing allocation of the buffer. An earlier (rebased-over) commit just wrapped the functions in html/layout.rs in BufWriters before i noticed that it was being handed Vecs most of the time anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By reusing the same Vec it minimizes the cost of allocation. Using a BufWriter every time you write to a file means having to allocate and deallocate a BufWriter each time which isn't super cheap. Considering the whole point of this is to shave off microseconds from an operation that can be repeated hundreds of thousands of times (in the literal sense, see winapi), reusing the Vec provides a small but important performance boost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't know we reused this thousands of times. Vec seems fine then, though ideally we'd add a clear to BufWriter probably...

Copy link
Member

@retep998 retep998 Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the Vec is merely shared between the documentation on the item and the redirect to the documentation on the item, so the allocation isn't cached across items, but it's still better than a fresh allocation for writing the redirect. Also, there's no way as far as I can see to change what file a BufWriter is wrapping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically i think you could try writer.flush(); *writer.get_mut() = &mut new_file; to just swap out the inner writer, though i'm not sure if just having that flush there will save you from other weirdness. Also it requires that all the BufWriters be used with the same type.

@carols10cents carols10cents added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 18, 2017
@alexcrichton
Copy link
Member

ping @QuietMisdreavus, just want to make sure this doesn't fall off your radar!

@QuietMisdreavus
Copy link
Member Author

I tried to find other things to work on by profiling rustdoc, but Visual Studio didn't think that symbols were made for rustdoc itself. Even without that, there's one more thing i'd like to try on this branch (keeping a single write buffer and handing that around when rendering all the pages). Otherwise i'd like to ask @retep998 to make sure this lowers the amount of WriteFile syscalls that are done for redirect pages.

@QuietMisdreavus QuietMisdreavus changed the title [WIP] rustdoc optimizations some low-hanging rustdoc optimizations Sep 23, 2017
@QuietMisdreavus
Copy link
Member Author

So here's where this PR stands:

Right now there are two basic changes here:

  • Create the directory structure for the documentation ahead of time, instead of calling create_dir_all for every file that gets written.
  • Proxy every call to layout::render or layout::redirect through a wrapper that uses a shared buffer before writing to whatever writer was asked for. (And strip out all of the ad-hoc buffering that was happening beforehand.) This helps cut down on the number of allocations that would get made during a doc rendering, and also buffers each file write (most of the file writes were already buffered, but redirect pages were being written directly, causing ~nine syscalls per redirect page.) (This slightly cuts down on the parallelizability of the rendering process, but if truly necessary we can move the buffer into TLS instead of the SharedContext.)

I would like to look at more structural optimization opportunities, but whenever i try to attach Visual Studio to rustdoc it refuses to acknowledge any debuginfo that would let it use the source files in librustdoc. It sees the symbols for rustdoc-tool-binary, which actually includes things like function names, but it doesn't see any source info that it can use to assist a debug or profile session. As such, i'm calling this PR ready to go, and i'll defer any farther work until i can figure out what's going on with that.

@QuietMisdreavus
Copy link
Member Author

QuietMisdreavus commented Sep 23, 2017

travis failure was some rustdoc output tests failing:

[01:05:53] failures:
[01:05:53]     [rustdoc] rustdoc/extern-links.rs
[01:05:53]     [rustdoc] rustdoc/inline_local/glob-extern.rs
[01:05:53]     [rustdoc] rustdoc/inline_local/glob-private.rs
[01:05:53]     [rustdoc] rustdoc/issue-34025.rs

Gonna figure out what i broke.

@QuietMisdreavus
Copy link
Member Author

Looks like i misjudged the checks for making sure a rendered item is empty, and wound up emitting empty files when the file shouldn't exist in the first place. I'm gonna try running these tests locally - hopefully my laptop fares better than my server, which has never been able to locally run a test in my experience >_>

@QuietMisdreavus
Copy link
Member Author

Turns out, i had the control flow wrong. I assumed the buffer checks i wrote in to places that checked for zero-sized writes were equivalent, but in some cases it doesn't even go through the write call, which is when that buffer would have been empty in the first place. I squashed the commits up with the fix. The tests that failed last time passed on my machine; let's see if travis agrees...

@@ -854,6 +853,35 @@ fn write_shared(cx: &Context,
}
try_err!(writeln!(&mut w, "initSearch(searchIndex);"), &dst);

// Create the directory structure ahead of time
let mut item = krate.module.clone().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cloning the entire crate here really necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally so that i could set the name on the root module, but looking at it again, I can just set the initial path to what i need and let the crate item keep its empty name. Let me try this locally.


debug!("creating directory tree: recursing into {}", name);

try_err!(fs::create_dir_all(&dst), &dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ends up creating empty directories for stripped modules that don't contain any redirects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a simple check for !m.items.is_empty() suffice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The modules aren't empty, they just don't contain anything that is reexported anywhere else. It's the reason create_dir_all was after the buf.is_empty() check to make sure directories are only created if there is actually a file to put in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a test for this just involve a #[doc(hidden)] module that doesn't have anything re-exported from it? I think i can recreate the check, i just need to make sure i'm doing it right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple crate like the following is enough to demonstrate this issue:

#[doc(hidden)]
pub mod hidden_mod {
    pub mod more_hidden {}
}

I don't recommend trying to recreate the logic though. If create_dir_all is really that expensive then I suggest adding a bool to Context to keep track of whether the current directory has been created yet.

Copy link
Member Author

@QuietMisdreavus QuietMisdreavus Sep 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repeated creation of directories adds up for big crates like libc or winapi, especially if you make documentation for these on windows, where filesystem operations tend to be more expensive. The problem with just adding a bool to Context is that Context::krate makes a clone of the Context for each item, so it only sees the state of the Context as it was when it had only passed the module itself. Putting it into SharedContext is a problem because there's no way to reset the SharedContext when Context::krate is done with it. To do it fully properly, you'd need to make it a HashSet<PathBuf> in the SharedContext, which trades your filesystem operations for a bunch of allocations. Is that a worthwhile tradeoff on platforms where checking the directory is relatively cheap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic it uses isn't that bad to recreate, unless you're worried about various HashMap lookups. I've force-pushed an update to this commit that should trim out these empty directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, adding a bool to Context doesn't work as well as I'd thought.

The new code misses at least one check:

} else if item.name.is_some() {

We don't generate pages for items with no name, specifically re-exports. The following creates an empty directory with the new version:

#[doc(hidden)]
pub mod hidden_mod {
    pub use super::Foo;
}
pub struct Foo;

It would be good to see some benchmark numbers for this to see if the added complexity is worth it.

Also, why is this in the write_shared function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retep998 ran procmon on an earlier iteration of this PR yesterday when rendering winapi documentation to verify the number of syscalls per-item went down. For that crate, anything that reduces the number of filesystem accesses per-item is worth it, simply because it has a ton of items being exported. I'll try to get some before/after numbers for this, at least on my own machine.

As for being in write_shared... that's where i decided to put it? I think it felt weird to put it into the main run function, but that does seem better, now that you point it out. I think i picked write_shared just because it was the last thing before the final docs were rendered. I'll move it into run, that's no problem for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PM 094225 <WindowsBunnyDoesSupportIndexing> misdreavus: 114 seconds in nightly to 107 seconds using semi recent version of your PR

(When they render documentation on winapi)

I've pushed the latest update. I'm going to try to update htmldocck.py to allow for checking whether a directory exists. That will have to be a separate PR, though.

@QuietMisdreavus QuietMisdreavus force-pushed the rustdoc-perf branch 2 times, most recently from 83c7aec to 00326b8 Compare September 25, 2017 02:51
@QuietMisdreavus
Copy link
Member Author

ping @ollie27 and @GuillaumeGomez, just wanted to make sure this doesn't fall off everyone's radar.

@GuillaumeGomez
Copy link
Member

Do you have some numbers to allow us to compare? A before/after would be very appreciated. :)

@QuietMisdreavus
Copy link
Member Author

This is from when @retep998 compared this branch to the latest nightly at the time:

PM 094225 <WindowsBunnyDoesSupportIndexing> misdreavus: 114 seconds in nightly to 107 seconds using semi recent version of your PR

That was when rendering winapi on windows, which was the worst-case inspiration for this PR. (The last time i tried comparing on my own system things kept going wrong, but i can give it another shot tonight.)

@retep998
Copy link
Member

Keep in mind that out of that time, 17 seconds is spent on compiling winapi, and another 78 seconds is spent on unavoidable NtCreateFile/NtWriteFile/NtCloseFile calls (unless rustdoc decides to stop creating so many files). If you exclude those two things, the difference is much more significant: 19 seconds to 12 seconds.

@carols10cents carols10cents added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 2, 2017
bors added a commit that referenced this pull request Oct 3, 2017
let htmldocck.py check for directories

Since i messed this up during #44613, i wanted to codify this into the rustdoc tests to make sure that doesn't happen again.
@shepmaster
Copy link
Member

Ping @rust-lang/docs. It's been over 6 days since we last heard from @GuillaumeGomez. It may be time to assign a new reviewer!

@shepmaster shepmaster added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Oct 6, 2017
@QuietMisdreavus
Copy link
Member Author

Might be better/faster to tag @rust-lang/dev-tools? This doesn't necessarily deal with the appearance or structure of docs so it more a dev-tools concern than a docs one.

@GuillaumeGomez
Copy link
Member

Ah right, I thought I already said it was good for me but I didn't. My bad... I'd just prefer that the @rust-lang/dev-tools take a look at it first.

@aidanhs
Copy link
Member

aidanhs commented Oct 12, 2017

Hi @fitzgen, you're the lucky random person from the dev tools team I've decided to additionally assign this PR to during triage! Would you be able to take a look at this, or select a more appropriate member of your team?

@fitzgen
Copy link
Member

fitzgen commented Oct 12, 2017

@michaelwoerister agreed to take a look on irc.

@michaelwoerister
Copy link
Member

michaelwoerister commented Oct 12, 2017

r? @michaelwoerister (so I don't forget)

@michaelwoerister michaelwoerister self-assigned this Oct 12, 2017
Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a review of just the first commit. My general impression is that introducing the complications of globally shared mutable state outweigh the potential performance gains here. I would suggest just making sure that file access is wrapped in BufWriters with enough capacity everywhere and testing whether that doesn't solve the problem too.

If it is indeed memory allocation that is a bottleneck here, I would suggest implementing a pool of Vec<u8> instead of sharing a single vector in a RefCell. Rust's move semantics make these perfectly safe and the pool's get method can take care of clearing the buffer before returning it.

@@ -125,6 +125,31 @@ pub struct SharedContext {
/// Warnings for the user if rendering would differ using different markdown
/// parsers.
pub markdown_warnings: RefCell<Vec<(Span, String, Vec<html_diff::Difference>)>>,
/// Shared buffer used for rendering pages, before writing to the filesystem.
pub write_buf: RefCell<Vec<u8>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to keep this mutable state around. It seems very easy to make a mistake handling it. Memory allocation is not that expensive, I would guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the idea behind adding the render and redirect methods to SharedContext, to automatically handle the buffer for a given write.

@@ -1105,7 +1131,7 @@ impl<'a> SourceCollector<'a> {
cur.push(&fname);
href.push_str(&fname.to_string_lossy());

let mut w = BufWriter::new(File::create(&cur)?);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried what happens if you just increase the capacity here? That should also decrease the number of syscalls.

if !buf.is_empty() {
let joint_dst = this.dst.join("index.html");
let mut dst = try_err!(File::create(&joint_dst), &joint_dst);
try_err!(dst.write_all(&buf), &joint_dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like the new version of this code does not change the amount of syscalls, since both versions write into a memory buffer first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section (and the related one above it) managed its own buffer since render_item would bypass calling layout::render or layout::redirect if the item was stripped and didn't need a page. Since i made a global buffer i needed to add some extra handling around this section to make sure it didn't copy the buffer to another buffer before actually writing it. Mainly to save the allocation and memcpy rather than the filesystem interactions.

@@ -1552,7 +1585,7 @@ impl Context {
if let Ok(mut redirect_out) = OpenOptions::new().create_new(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could wrap redirect_out in a BufWriter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial version of this put the buffer inside layout::render and layout::redirect, before realizing that most of the existing calls used their own ad-hoc buffering anyway. So that initial commit was scrapped and turned into the global buffer that is up now. After having gone through all the layout:: calls in here, i don't think i'd mind just reverting the global buffer and adding BufWriters here, since that was the major culprit being addressed in this commit. This was one of the main focuses in this PR, since the raw write!() on a redirect will translate to about 9 file writes for a fairly small file, due to how they get broken up in the formatting.

@@ -1562,7 +1595,7 @@ impl Context {
let redir_name = format!("{}.{}!.html", item_type, name);
let redir_dst = self.dst.join(redir_name);
let mut redirect_out = try_err!(File::create(&redir_dst), &redir_dst);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: This is the review of the second commit.

It seems to me that the logic in fn recurse could very easily go out of sync with what directories the subsequent code expects.

If the goal is to avoid redundant calls to fs::create_dir_all(), you could also just keep an in-memory cache of directories already created, in SharedContext for example. Then you could have a method like this:

impl Context {
    fn ensure_dir_exists(&self, dir: &Path) -> Result<(), Error> {
        if self.shared.dirs_created.borrow_mut().insert(dir.to_path_buf()) {
            try_err!(fs::create_dir_all(path), path);
        }
    }
}

That way you don't have to keep two complicated trees of decision logic in sync.

@QuietMisdreavus
Copy link
Member Author

Re: ensure_dir_exists, that sounds more reasonable at this point. I think i was really anxious to avoid the allocations for the PathBufs and HashSet while writing it, but looking back, that's probably much less of a cost than the actual directory creation calls. I'll go ahead and do that.

@QuietMisdreavus
Copy link
Member Author

I've force-pushed an update that massively strips down this PR:

  • The global write buffer is gone. The main culprit that that commit was intended to address was the multiple write calls when writing a redirect page, so now that commit only wraps those files in BufWriters. Everything else went through a buffer in one way or another, so i left it alone.
  • The advance directory creation logic is gone. In its place is ensure_dir, written nearly exactly like @michaelwoerister suggested. The create_dir_all calls that were taken out of the first commit have been replaced with calls to ensure_dir instead. (Except for one of them - that directory is always going to exist by the time that line comes up, so i just left that one out.)

@retep998
Copy link
Member

Unfortunate to see that the shared buffer is gone, considering that a significant portion of the CPU time that wasn't spent on IO syscalls was spent on heap allocation.

impl SharedContext {
fn ensure_dir(&self, dst: &Path) -> io::Result<()> {
let mut dirs = self.created_dirs.borrow_mut();
if !dirs.contains(dst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nicely avoids allocating the PathBuf if the directory is already present 👍

@michaelwoerister
Copy link
Member

@QuietMisdreavus Thanks a lot for updating the PR. I think it's worth keeping things simple.

@bors r+

@bors
Copy link
Collaborator

bors commented Oct 15, 2017

📌 Commit 2c9d452 has been approved by michaelwoerister

@michaelwoerister
Copy link
Member

Unfortunate to see that the shared buffer is gone, considering that a significant portion of the CPU time that wasn't spent on IO syscalls was spent on heap allocation.

There's still the option of implementing a pool of re-usable buffers which should have pretty much the same performance characteristics but without the architectural downsides.

@bors
Copy link
Collaborator

bors commented Oct 15, 2017

⌛ Testing commit 2c9d452 with merge c4f489a...

bors added a commit that referenced this pull request Oct 15, 2017
some low-hanging rustdoc optimizations

There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.
@bors
Copy link
Collaborator

bors commented Oct 15, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing c4f489a to master...

@bors bors merged commit 2c9d452 into rust-lang:master Oct 15, 2017
@QuietMisdreavus QuietMisdreavus deleted the rustdoc-perf branch February 26, 2018 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.