some low-hanging rustdoc optimizations #44613

QuietMisdreavus · 2017-09-15T19:54:25Z

There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.

rust-highfive · 2017-09-15T19:54:37Z

r? @frewsxcv

(rust_highfive has picked a reviewer for you, use r? to override)

QuietMisdreavus · 2017-09-15T19:55:26Z

cc @retep998 since they offered to help profile on windows - winapi was one of the inspirations for this PR

cc @bluss since they were also in the discussion and itertools was the other inspiration

Mark-Simulacrum · 2017-09-15T22:36:25Z

src/librustdoc/html/render.rs

-                    try_err!(layout::redirect(&mut redirect_out, file_name), &redir_dst);
+                    buf.clear();
+                    try_err!(layout::redirect(&mut buf, file_name), &redir_dst);
+                    try_err!(redirect_out.write_all(&buf), &redir_dst);


Seems like a BufWriter around redirect_out is what's really wanted here and below.

This reuses the existing allocation of the buffer. An earlier (rebased-over) commit just wrapped the functions in html/layout.rs in BufWriters before i noticed that it was being handed Vecs most of the time anyway.

By reusing the same Vec it minimizes the cost of allocation. Using a BufWriter every time you write to a file means having to allocate and deallocate a BufWriter each time which isn't super cheap. Considering the whole point of this is to shave off microseconds from an operation that can be repeated hundreds of thousands of times (in the literal sense, see winapi), reusing the Vec provides a small but important performance boost.

Ah, didn't know we reused this thousands of times. Vec seems fine then, though ideally we'd add a clear to BufWriter probably...

Right now the Vec is merely shared between the documentation on the item and the redirect to the documentation on the item, so the allocation isn't cached across items, but it's still better than a fresh allocation for writing the redirect. Also, there's no way as far as I can see to change what file a BufWriter is wrapping.

Technically i think you could try writer.flush(); *writer.get_mut() = &mut new_file; to just swap out the inner writer, though i'm not sure if just having that flush there will save you from other weirdness. Also it requires that all the BufWriters be used with the same type.

alexcrichton · 2017-09-21T16:46:46Z

ping @QuietMisdreavus, just want to make sure this doesn't fall off your radar!

QuietMisdreavus · 2017-09-21T16:57:19Z

I tried to find other things to work on by profiling rustdoc, but Visual Studio didn't think that symbols were made for rustdoc itself. Even without that, there's one more thing i'd like to try on this branch (keeping a single write buffer and handing that around when rendering all the pages). Otherwise i'd like to ask @retep998 to make sure this lowers the amount of WriteFile syscalls that are done for redirect pages.

QuietMisdreavus · 2017-09-23T22:19:12Z

So here's where this PR stands:

Right now there are two basic changes here:

Create the directory structure for the documentation ahead of time, instead of calling create_dir_all for every file that gets written.
Proxy every call to layout::render or layout::redirect through a wrapper that uses a shared buffer before writing to whatever writer was asked for. (And strip out all of the ad-hoc buffering that was happening beforehand.) This helps cut down on the number of allocations that would get made during a doc rendering, and also buffers each file write (most of the file writes were already buffered, but redirect pages were being written directly, causing ~nine syscalls per redirect page.) (This slightly cuts down on the parallelizability of the rendering process, but if truly necessary we can move the buffer into TLS instead of the SharedContext.)

I would like to look at more structural optimization opportunities, but whenever i try to attach Visual Studio to rustdoc it refuses to acknowledge any debuginfo that would let it use the source files in librustdoc. It sees the symbols for rustdoc-tool-binary, which actually includes things like function names, but it doesn't see any source info that it can use to assist a debug or profile session. As such, i'm calling this PR ready to go, and i'll defer any farther work until i can figure out what's going on with that.

QuietMisdreavus · 2017-09-23T23:46:24Z

travis failure was some rustdoc output tests failing:

[01:05:53] failures:
[01:05:53]     [rustdoc] rustdoc/extern-links.rs
[01:05:53]     [rustdoc] rustdoc/inline_local/glob-extern.rs
[01:05:53]     [rustdoc] rustdoc/inline_local/glob-private.rs
[01:05:53]     [rustdoc] rustdoc/issue-34025.rs

Gonna figure out what i broke.

QuietMisdreavus · 2017-09-24T00:24:10Z

Looks like i misjudged the checks for making sure a rendered item is empty, and wound up emitting empty files when the file shouldn't exist in the first place. I'm gonna try running these tests locally - hopefully my laptop fares better than my server, which has never been able to locally run a test in my experience >_>

QuietMisdreavus · 2017-09-24T01:54:44Z

Turns out, i had the control flow wrong. I assumed the buffer checks i wrote in to places that checked for zero-sized writes were equivalent, but in some cases it doesn't even go through the write call, which is when that buffer would have been empty in the first place. I squashed the commits up with the fix. The tests that failed last time passed on my machine; let's see if travis agrees...

ollie27 · 2017-09-24T17:46:54Z

src/librustdoc/html/render.rs

@@ -854,6 +853,35 @@ fn write_shared(cx: &Context,
    }
    try_err!(writeln!(&mut w, "initSearch(searchIndex);"), &dst);

+    // Create the directory structure ahead of time
+    let mut item = krate.module.clone().unwrap();


Is cloning the entire crate here really necessary?

It was originally so that i could set the name on the root module, but looking at it again, I can just set the initial path to what i need and let the crate item keep its empty name. Let me try this locally.

ollie27 · 2017-09-24T17:48:58Z

src/librustdoc/html/render.rs

+
+        debug!("creating directory tree: recursing into {}", name);
+
+        try_err!(fs::create_dir_all(&dst), &dst);


This ends up creating empty directories for stripped modules that don't contain any redirects.

Would a simple check for !m.items.is_empty() suffice?

No. The modules aren't empty, they just don't contain anything that is reexported anywhere else. It's the reason create_dir_all was after the buf.is_empty() check to make sure directories are only created if there is actually a file to put in it.

Would a test for this just involve a #[doc(hidden)] module that doesn't have anything re-exported from it? I think i can recreate the check, i just need to make sure i'm doing it right.

A simple crate like the following is enough to demonstrate this issue:

#[doc(hidden)] pub mod hidden_mod { pub mod more_hidden {} }

I don't recommend trying to recreate the logic though. If create_dir_all is really that expensive then I suggest adding a bool to Context to keep track of whether the current directory has been created yet.

The repeated creation of directories adds up for big crates like libc or winapi, especially if you make documentation for these on windows, where filesystem operations tend to be more expensive. The problem with just adding a bool to Context is that Context::krate makes a clone of the Context for each item, so it only sees the state of the Context as it was when it had only passed the module itself. Putting it into SharedContext is a problem because there's no way to reset the SharedContext when Context::krate is done with it. To do it fully properly, you'd need to make it a HashSet<PathBuf> in the SharedContext, which trades your filesystem operations for a bunch of allocations. Is that a worthwhile tradeoff on platforms where checking the directory is relatively cheap?

The logic it uses isn't that bad to recreate, unless you're worried about various HashMap lookups. I've force-pushed an update to this commit that should trim out these empty directories.

Yeah, adding a bool to Context doesn't work as well as I'd thought.

The new code misses at least one check:

rust/src/librustdoc/html/render.rs

Line 1533 in 1ed7d41

} else if item.name.is_some() {

We don't generate pages for items with no name, specifically re-exports. The following creates an empty directory with the new version:

#[doc(hidden)] pub mod hidden_mod { pub use super::Foo; } pub struct Foo;

It would be good to see some benchmark numbers for this to see if the added complexity is worth it.

Also, why is this in the write_shared function?

@retep998 ran procmon on an earlier iteration of this PR yesterday when rendering winapi documentation to verify the number of syscalls per-item went down. For that crate, anything that reduces the number of filesystem accesses per-item is worth it, simply because it has a ton of items being exported. I'll try to get some before/after numbers for this, at least on my own machine.

As for being in write_shared... that's where i decided to put it? I think it felt weird to put it into the main run function, but that does seem better, now that you point it out. I think i picked write_shared just because it was the last thing before the final docs were rendered. I'll move it into run, that's no problem for me.

PM 094225 <WindowsBunnyDoesSupportIndexing> misdreavus: 114 seconds in nightly to 107 seconds using semi recent version of your PR

(When they render documentation on winapi)

I've pushed the latest update. I'm going to try to update htmldocck.py to allow for checking whether a directory exists. That will have to be a separate PR, though.

QuietMisdreavus · 2017-09-28T13:56:46Z

ping @ollie27 and @GuillaumeGomez, just wanted to make sure this doesn't fall off everyone's radar.

GuillaumeGomez · 2017-09-28T15:44:03Z

Do you have some numbers to allow us to compare? A before/after would be very appreciated. :)

QuietMisdreavus · 2017-09-28T16:02:02Z

This is from when @retep998 compared this branch to the latest nightly at the time:

PM 094225 <WindowsBunnyDoesSupportIndexing> misdreavus: 114 seconds in nightly to 107 seconds using semi recent version of your PR

That was when rendering winapi on windows, which was the worst-case inspiration for this PR. (The last time i tried comparing on my own system things kept going wrong, but i can give it another shot tonight.)

retep998 · 2017-09-28T17:11:25Z

Keep in mind that out of that time, 17 seconds is spent on compiling winapi, and another 78 seconds is spent on unavoidable NtCreateFile/NtWriteFile/NtCloseFile calls (unless rustdoc decides to stop creating so many files). If you exclude those two things, the difference is much more significant: 19 seconds to 12 seconds.

let htmldocck.py check for directories Since i messed this up during #44613, i wanted to codify this into the rustdoc tests to make sure that doesn't happen again.

shepmaster · 2017-10-06T18:37:17Z

Ping @rust-lang/docs. It's been over 6 days since we last heard from @GuillaumeGomez. It may be time to assign a new reviewer!

QuietMisdreavus · 2017-10-06T20:04:02Z

Might be better/faster to tag @rust-lang/dev-tools? This doesn't necessarily deal with the appearance or structure of docs so it more a dev-tools concern than a docs one.

GuillaumeGomez · 2017-10-07T17:04:29Z

Ah right, I thought I already said it was good for me but I didn't. My bad... I'd just prefer that the @rust-lang/dev-tools take a look at it first.

aidanhs · 2017-10-12T15:35:31Z

Hi @fitzgen, you're the lucky random person from the dev tools team I've decided to additionally assign this PR to during triage! Would you be able to take a look at this, or select a more appropriate member of your team?

fitzgen · 2017-10-12T19:59:39Z

@michaelwoerister agreed to take a look on irc.

michaelwoerister · 2017-10-12T20:02:24Z

r? @michaelwoerister (so I don't forget)

michaelwoerister

This is a review of just the first commit. My general impression is that introducing the complications of globally shared mutable state outweigh the potential performance gains here. I would suggest just making sure that file access is wrapped in BufWriters with enough capacity everywhere and testing whether that doesn't solve the problem too.

If it is indeed memory allocation that is a bottleneck here, I would suggest implementing a pool of Vec<u8> instead of sharing a single vector in a RefCell. Rust's move semantics make these perfectly safe and the pool's get method can take care of clearing the buffer before returning it.

michaelwoerister · 2017-10-13T12:45:43Z

src/librustdoc/html/render.rs

@@ -125,6 +125,31 @@ pub struct SharedContext {
    /// Warnings for the user if rendering would differ using different markdown
    /// parsers.
    pub markdown_warnings: RefCell<Vec<(Span, String, Vec<html_diff::Difference>)>>,
+    /// Shared buffer used for rendering pages, before writing to the filesystem.
+    pub write_buf: RefCell<Vec<u8>>,


I don't think it's a good idea to keep this mutable state around. It seems very easy to make a mistake handling it. Memory allocation is not that expensive, I would guess.

That was the idea behind adding the render and redirect methods to SharedContext, to automatically handle the buffer for a given write.

michaelwoerister · 2017-10-13T12:48:23Z

src/librustdoc/html/render.rs

@@ -1105,7 +1131,7 @@ impl<'a> SourceCollector<'a> {
        cur.push(&fname);
        href.push_str(&fname.to_string_lossy());

-        let mut w = BufWriter::new(File::create(&cur)?);


Have you tried what happens if you just increase the capacity here? That should also decrease the number of syscalls.

michaelwoerister · 2017-10-13T12:53:44Z

src/librustdoc/html/render.rs

-                if !buf.is_empty() {
-                    let joint_dst = this.dst.join("index.html");
-                    let mut dst = try_err!(File::create(&joint_dst), &joint_dst);
-                    try_err!(dst.write_all(&buf), &joint_dst);


It looks to me like the new version of this code does not change the amount of syscalls, since both versions write into a memory buffer first.

This section (and the related one above it) managed its own buffer since render_item would bypass calling layout::render or layout::redirect if the item was stripped and didn't need a page. Since i made a global buffer i needed to add some extra handling around this section to make sure it didn't copy the buffer to another buffer before actually writing it. Mainly to save the allocation and memcpy rather than the filesystem interactions.

michaelwoerister · 2017-10-13T12:56:38Z

src/librustdoc/html/render.rs

@@ -1552,7 +1585,7 @@ impl Context {
                if let Ok(mut redirect_out) = OpenOptions::new().create_new(true)


You could wrap redirect_out in a BufWriter.

An initial version of this put the buffer inside layout::render and layout::redirect, before realizing that most of the existing calls used their own ad-hoc buffering anyway. So that initial commit was scrapped and turned into the global buffer that is up now. After having gone through all the layout:: calls in here, i don't think i'd mind just reverting the global buffer and adding BufWriters here, since that was the major culprit being addressed in this commit. This was one of the main focuses in this PR, since the raw write!() on a redirect will translate to about 9 file writes for a fairly small file, due to how they get broken up in the formatting.

michaelwoerister · 2017-10-13T12:57:15Z

src/librustdoc/html/render.rs

@@ -1562,7 +1595,7 @@ impl Context {
                    let redir_name = format!("{}.{}!.html", item_type, name);
                    let redir_dst = self.dst.join(redir_name);
                    let mut redirect_out = try_err!(File::create(&redir_dst), &redir_dst);


michaelwoerister

EDIT: This is the review of the second commit.

It seems to me that the logic in fn recurse could very easily go out of sync with what directories the subsequent code expects.

If the goal is to avoid redundant calls to fs::create_dir_all(), you could also just keep an in-memory cache of directories already created, in SharedContext for example. Then you could have a method like this:

impl Context {
    fn ensure_dir_exists(&self, dir: &Path) -> Result<(), Error> {
        if self.shared.dirs_created.borrow_mut().insert(dir.to_path_buf()) {
            try_err!(fs::create_dir_all(path), path);
        }
    }
}

That way you don't have to keep two complicated trees of decision logic in sync.

QuietMisdreavus · 2017-10-13T15:39:43Z

Re: ensure_dir_exists, that sounds more reasonable at this point. I think i was really anxious to avoid the allocations for the PathBufs and HashSet while writing it, but looking back, that's probably much less of a cost than the actual directory creation calls. I'll go ahead and do that.

QuietMisdreavus · 2017-10-13T20:38:38Z

I've force-pushed an update that massively strips down this PR:

The global write buffer is gone. The main culprit that that commit was intended to address was the multiple write calls when writing a redirect page, so now that commit only wraps those files in BufWriters. Everything else went through a buffer in one way or another, so i left it alone.
The advance directory creation logic is gone. In its place is ensure_dir, written nearly exactly like @michaelwoerister suggested. The create_dir_all calls that were taken out of the first commit have been replaced with calls to ensure_dir instead. (Except for one of them - that directory is always going to exist by the time that line comes up, so i just left that one out.)

retep998 · 2017-10-13T21:03:41Z

Unfortunate to see that the shared buffer is gone, considering that a significant portion of the CPU time that wasn't spent on IO syscalls was spent on heap allocation.

michaelwoerister · 2017-10-15T16:47:21Z

src/librustdoc/html/render.rs

+impl SharedContext {
+    fn ensure_dir(&self, dst: &Path) -> io::Result<()> {
+        let mut dirs = self.created_dirs.borrow_mut();
+        if !dirs.contains(dst) {


This nicely avoids allocating the PathBuf if the directory is already present 👍

michaelwoerister · 2017-10-15T16:49:38Z

@QuietMisdreavus Thanks a lot for updating the PR. I think it's worth keeping things simple.

@bors r+

bors · 2017-10-15T16:49:39Z

📌 Commit 2c9d452 has been approved by michaelwoerister

michaelwoerister · 2017-10-15T16:50:59Z

Unfortunate to see that the shared buffer is gone, considering that a significant portion of the CPU time that wasn't spent on IO syscalls was spent on heap allocation.

There's still the option of implementing a pool of re-usable buffers which should have pretty much the same performance characteristics but without the architectural downsides.

bors · 2017-10-15T17:48:18Z

⌛ Testing commit 2c9d452 with merge c4f489a...

some low-hanging rustdoc optimizations There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.

bors · 2017-10-15T20:07:31Z

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing c4f489a to master...

rust-highfive assigned frewsxcv Sep 15, 2017

frewsxcv assigned GuillaumeGomez and unassigned frewsxcv Sep 15, 2017

QuietMisdreavus force-pushed the rustdoc-perf branch from eb5fb09 to e1c7485 Compare September 15, 2017 20:51

Mark-Simulacrum reviewed Sep 15, 2017

View reviewed changes

carols10cents added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Sep 18, 2017

QuietMisdreavus changed the title ~~[WIP] rustdoc optimizations~~ some low-hanging rustdoc optimizations Sep 23, 2017

QuietMisdreavus force-pushed the rustdoc-perf branch from e06383c to de24e86 Compare September 24, 2017 01:50

QuietMisdreavus mentioned this pull request Sep 24, 2017

rustdoc is not built with debuginfo, even if debuginfo=true #44810

Closed

ollie27 reviewed Sep 24, 2017

View reviewed changes

QuietMisdreavus force-pushed the rustdoc-perf branch 2 times, most recently from 83c7aec to 00326b8 Compare September 25, 2017 02:51

QuietMisdreavus mentioned this pull request Sep 30, 2017

let htmldocck.py check for directories #44949

Merged

carols10cents added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 2, 2017

shepmaster added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Oct 6, 2017

shepmaster added the T-dev-tools-rustdoc label Oct 6, 2017

aidanhs assigned GuillaumeGomez and unassigned GuillaumeGomez Oct 12, 2017

michaelwoerister self-assigned this Oct 12, 2017

michaelwoerister reviewed Oct 13, 2017

View reviewed changes

QuietMisdreavus added 2 commits October 13, 2017 15:26

rustdoc: don't create directories more than once

9e1c577

rustdoc: wrap redirect page writers in a BufWriter

2c9d452

QuietMisdreavus force-pushed the rustdoc-perf branch from 00326b8 to 2c9d452 Compare October 13, 2017 20:28

michaelwoerister reviewed Oct 15, 2017

View reviewed changes

bors merged commit 2c9d452 into rust-lang:master Oct 15, 2017

QuietMisdreavus deleted the rustdoc-perf branch February 26, 2018 23:23


		debug!("creating directory tree: recursing into {}", name);

		try_err!(fs::create_dir_all(&dst), &dst);

		@@ -1552,7 +1585,7 @@ impl Context {
		if let Ok(mut redirect_out) = OpenOptions::new().create_new(true)

some low-hanging rustdoc optimizations #44613

some low-hanging rustdoc optimizations #44613

Conversation

QuietMisdreavus commented Sep 15, 2017

rust-highfive commented Sep 15, 2017

QuietMisdreavus commented Sep 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

retep998 Sep 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Sep 21, 2017

QuietMisdreavus commented Sep 21, 2017

QuietMisdreavus commented Sep 23, 2017

QuietMisdreavus commented Sep 23, 2017 • edited Loading

QuietMisdreavus commented Sep 24, 2017

QuietMisdreavus commented Sep 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuietMisdreavus Sep 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuietMisdreavus commented Sep 28, 2017

GuillaumeGomez commented Sep 28, 2017

QuietMisdreavus commented Sep 28, 2017

retep998 commented Sep 28, 2017

shepmaster commented Oct 6, 2017

QuietMisdreavus commented Oct 6, 2017

GuillaumeGomez commented Oct 7, 2017

aidanhs commented Oct 12, 2017

fitzgen commented Oct 12, 2017

michaelwoerister commented Oct 12, 2017 • edited Loading

michaelwoerister left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelwoerister left a comment • edited Loading

Choose a reason for hiding this comment

QuietMisdreavus commented Oct 13, 2017

QuietMisdreavus commented Oct 13, 2017

retep998 commented Oct 13, 2017

Choose a reason for hiding this comment

michaelwoerister commented Oct 15, 2017

bors commented Oct 15, 2017

michaelwoerister commented Oct 15, 2017

bors commented Oct 15, 2017

bors commented Oct 15, 2017

retep998 Sep 15, 2017 •

edited

Loading

QuietMisdreavus commented Sep 23, 2017 •

edited

Loading

QuietMisdreavus Sep 24, 2017 •

edited

Loading

michaelwoerister commented Oct 12, 2017 •

edited

Loading

michaelwoerister left a comment •

edited

Loading