-
Notifications
You must be signed in to change notification settings - Fork 13.3k
some low-hanging rustdoc optimizations #44613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @frewsxcv (rust_highfive has picked a reviewer for you, use r? to override) |
eb5fb09
to
e1c7485
Compare
src/librustdoc/html/render.rs
Outdated
try_err!(layout::redirect(&mut redirect_out, file_name), &redir_dst); | ||
buf.clear(); | ||
try_err!(layout::redirect(&mut buf, file_name), &redir_dst); | ||
try_err!(redirect_out.write_all(&buf), &redir_dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a BufWriter around redirect_out is what's really wanted here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reuses the existing allocation of the buffer. An earlier (rebased-over) commit just wrapped the functions in html/layout.rs
in BufWriters before i noticed that it was being handed Vecs most of the time anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By reusing the same Vec
it minimizes the cost of allocation. Using a BufWriter
every time you write to a file means having to allocate and deallocate a BufWriter
each time which isn't super cheap. Considering the whole point of this is to shave off microseconds from an operation that can be repeated hundreds of thousands of times (in the literal sense, see winapi), reusing the Vec
provides a small but important performance boost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, didn't know we reused this thousands of times. Vec
seems fine then, though ideally we'd add a clear
to BufWriter
probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the Vec
is merely shared between the documentation on the item and the redirect to the documentation on the item, so the allocation isn't cached across items, but it's still better than a fresh allocation for writing the redirect. Also, there's no way as far as I can see to change what file a BufWriter
is wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically i think you could try writer.flush(); *writer.get_mut() = &mut new_file;
to just swap out the inner writer, though i'm not sure if just having that flush there will save you from other weirdness. Also it requires that all the BufWriters be used with the same type.
ping @QuietMisdreavus, just want to make sure this doesn't fall off your radar! |
I tried to find other things to work on by profiling rustdoc, but Visual Studio didn't think that symbols were made for rustdoc itself. Even without that, there's one more thing i'd like to try on this branch (keeping a single write buffer and handing that around when rendering all the pages). Otherwise i'd like to ask @retep998 to make sure this lowers the amount of WriteFile syscalls that are done for redirect pages. |
So here's where this PR stands: Right now there are two basic changes here:
I would like to look at more structural optimization opportunities, but whenever i try to attach Visual Studio to rustdoc it refuses to acknowledge any debuginfo that would let it use the source files in librustdoc. It sees the symbols for |
travis failure was some rustdoc output tests failing:
Gonna figure out what i broke. |
Looks like i misjudged the checks for making sure a rendered item is empty, and wound up emitting empty files when the file shouldn't exist in the first place. I'm gonna try running these tests locally - hopefully my laptop fares better than my server, which has never been able to locally run a test in my experience |
e06383c
to
de24e86
Compare
Turns out, i had the control flow wrong. I assumed the buffer checks i wrote in to places that checked for zero-sized writes were equivalent, but in some cases it doesn't even go through the write call, which is when that buffer would have been empty in the first place. I squashed the commits up with the fix. The tests that failed last time passed on my machine; let's see if travis agrees... |
src/librustdoc/html/render.rs
Outdated
@@ -854,6 +853,35 @@ fn write_shared(cx: &Context, | |||
} | |||
try_err!(writeln!(&mut w, "initSearch(searchIndex);"), &dst); | |||
|
|||
// Create the directory structure ahead of time | |||
let mut item = krate.module.clone().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is cloning the entire crate here really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally so that i could set the name on the root module, but looking at it again, I can just set the initial path to what i need and let the crate item keep its empty name. Let me try this locally.
src/librustdoc/html/render.rs
Outdated
|
||
debug!("creating directory tree: recursing into {}", name); | ||
|
||
try_err!(fs::create_dir_all(&dst), &dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up creating empty directories for stripped modules that don't contain any redirects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a simple check for !m.items.is_empty()
suffice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The modules aren't empty, they just don't contain anything that is reexported anywhere else. It's the reason create_dir_all
was after the buf.is_empty()
check to make sure directories are only created if there is actually a file to put in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a test for this just involve a #[doc(hidden)]
module that doesn't have anything re-exported from it? I think i can recreate the check, i just need to make sure i'm doing it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple crate like the following is enough to demonstrate this issue:
#[doc(hidden)]
pub mod hidden_mod {
pub mod more_hidden {}
}
I don't recommend trying to recreate the logic though. If create_dir_all
is really that expensive then I suggest adding a bool
to Context
to keep track of whether the current directory has been created yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repeated creation of directories adds up for big crates like libc or winapi, especially if you make documentation for these on windows, where filesystem operations tend to be more expensive. The problem with just adding a bool
to Context
is that Context::krate
makes a clone of the Context for each item, so it only sees the state of the Context as it was when it had only passed the module itself. Putting it into SharedContext
is a problem because there's no way to reset the SharedContext when Context::krate
is done with it. To do it fully properly, you'd need to make it a HashSet<PathBuf>
in the SharedContext
, which trades your filesystem operations for a bunch of allocations. Is that a worthwhile tradeoff on platforms where checking the directory is relatively cheap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic it uses isn't that bad to recreate, unless you're worried about various HashMap lookups. I've force-pushed an update to this commit that should trim out these empty directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, adding a bool
to Context
doesn't work as well as I'd thought.
The new code misses at least one check:
rust/src/librustdoc/html/render.rs
Line 1533 in 1ed7d41
} else if item.name.is_some() { |
We don't generate pages for items with no name, specifically re-exports. The following creates an empty directory with the new version:
#[doc(hidden)]
pub mod hidden_mod {
pub use super::Foo;
}
pub struct Foo;
It would be good to see some benchmark numbers for this to see if the added complexity is worth it.
Also, why is this in the write_shared
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@retep998 ran procmon on an earlier iteration of this PR yesterday when rendering winapi documentation to verify the number of syscalls per-item went down. For that crate, anything that reduces the number of filesystem accesses per-item is worth it, simply because it has a ton of items being exported. I'll try to get some before/after numbers for this, at least on my own machine.
As for being in write_shared
... that's where i decided to put it? I think it felt weird to put it into the main run
function, but that does seem better, now that you point it out. I think i picked write_shared
just because it was the last thing before the final docs were rendered. I'll move it into run
, that's no problem for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PM 094225 <WindowsBunnyDoesSupportIndexing> misdreavus: 114 seconds in nightly to 107 seconds using semi recent version of your PR
(When they render documentation on winapi)
I've pushed the latest update. I'm going to try to update htmldocck.py
to allow for checking whether a directory exists. That will have to be a separate PR, though.
83c7aec
to
00326b8
Compare
ping @ollie27 and @GuillaumeGomez, just wanted to make sure this doesn't fall off everyone's radar. |
Do you have some numbers to allow us to compare? A before/after would be very appreciated. :) |
This is from when @retep998 compared this branch to the latest nightly at the time:
That was when rendering winapi on windows, which was the worst-case inspiration for this PR. (The last time i tried comparing on my own system things kept going wrong, but i can give it another shot tonight.) |
Keep in mind that out of that time, 17 seconds is spent on compiling winapi, and another 78 seconds is spent on unavoidable |
let htmldocck.py check for directories Since i messed this up during #44613, i wanted to codify this into the rustdoc tests to make sure that doesn't happen again.
Ping @rust-lang/docs. It's been over 6 days since we last heard from @GuillaumeGomez. It may be time to assign a new reviewer! |
Might be better/faster to tag @rust-lang/dev-tools? This doesn't necessarily deal with the appearance or structure of docs so it more a dev-tools concern than a docs one. |
Ah right, I thought I already said it was good for me but I didn't. My bad... I'd just prefer that the @rust-lang/dev-tools take a look at it first. |
Hi @fitzgen, you're the lucky random person from the dev tools team I've decided to additionally assign this PR to during triage! Would you be able to take a look at this, or select a more appropriate member of your team? |
@michaelwoerister agreed to take a look on irc. |
r? @michaelwoerister (so I don't forget) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a review of just the first commit. My general impression is that introducing the complications of globally shared mutable state outweigh the potential performance gains here. I would suggest just making sure that file access is wrapped in BufWriter
s with enough capacity everywhere and testing whether that doesn't solve the problem too.
If it is indeed memory allocation that is a bottleneck here, I would suggest implementing a pool of Vec<u8>
instead of sharing a single vector in a RefCell
. Rust's move semantics make these perfectly safe and the pool's get
method can take care of clearing the buffer before returning it.
src/librustdoc/html/render.rs
Outdated
@@ -125,6 +125,31 @@ pub struct SharedContext { | |||
/// Warnings for the user if rendering would differ using different markdown | |||
/// parsers. | |||
pub markdown_warnings: RefCell<Vec<(Span, String, Vec<html_diff::Difference>)>>, | |||
/// Shared buffer used for rendering pages, before writing to the filesystem. | |||
pub write_buf: RefCell<Vec<u8>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to keep this mutable state around. It seems very easy to make a mistake handling it. Memory allocation is not that expensive, I would guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the idea behind adding the render
and redirect
methods to SharedContext
, to automatically handle the buffer for a given write.
src/librustdoc/html/render.rs
Outdated
@@ -1105,7 +1131,7 @@ impl<'a> SourceCollector<'a> { | |||
cur.push(&fname); | |||
href.push_str(&fname.to_string_lossy()); | |||
|
|||
let mut w = BufWriter::new(File::create(&cur)?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried what happens if you just increase the capacity here? That should also decrease the number of syscalls.
src/librustdoc/html/render.rs
Outdated
if !buf.is_empty() { | ||
let joint_dst = this.dst.join("index.html"); | ||
let mut dst = try_err!(File::create(&joint_dst), &joint_dst); | ||
try_err!(dst.write_all(&buf), &joint_dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like the new version of this code does not change the amount of syscalls, since both versions write into a memory buffer first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section (and the related one above it) managed its own buffer since render_item
would bypass calling layout::render
or layout::redirect
if the item was stripped and didn't need a page. Since i made a global buffer i needed to add some extra handling around this section to make sure it didn't copy the buffer to another buffer before actually writing it. Mainly to save the allocation and memcpy rather than the filesystem interactions.
src/librustdoc/html/render.rs
Outdated
@@ -1552,7 +1585,7 @@ impl Context { | |||
if let Ok(mut redirect_out) = OpenOptions::new().create_new(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could wrap redirect_out
in a BufWriter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial version of this put the buffer inside layout::render
and layout::redirect
, before realizing that most of the existing calls used their own ad-hoc buffering anyway. So that initial commit was scrapped and turned into the global buffer that is up now. After having gone through all the layout::
calls in here, i don't think i'd mind just reverting the global buffer and adding BufWriter
s here, since that was the major culprit being addressed in this commit. This was one of the main focuses in this PR, since the raw write!()
on a redirect will translate to about 9 file writes for a fairly small file, due to how they get broken up in the formatting.
src/librustdoc/html/render.rs
Outdated
@@ -1562,7 +1595,7 @@ impl Context { | |||
let redir_name = format!("{}.{}!.html", item_type, name); | |||
let redir_dst = self.dst.join(redir_name); | |||
let mut redirect_out = try_err!(File::create(&redir_dst), &redir_dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: This is the review of the second commit.
It seems to me that the logic in fn recurse
could very easily go out of sync with what directories the subsequent code expects.
If the goal is to avoid redundant calls to fs::create_dir_all()
, you could also just keep an in-memory cache of directories already created, in SharedContext
for example. Then you could have a method like this:
impl Context {
fn ensure_dir_exists(&self, dir: &Path) -> Result<(), Error> {
if self.shared.dirs_created.borrow_mut().insert(dir.to_path_buf()) {
try_err!(fs::create_dir_all(path), path);
}
}
}
That way you don't have to keep two complicated trees of decision logic in sync.
Re: |
00326b8
to
2c9d452
Compare
I've force-pushed an update that massively strips down this PR:
|
Unfortunate to see that the shared buffer is gone, considering that a significant portion of the CPU time that wasn't spent on IO syscalls was spent on heap allocation. |
impl SharedContext { | ||
fn ensure_dir(&self, dst: &Path) -> io::Result<()> { | ||
let mut dirs = self.created_dirs.borrow_mut(); | ||
if !dirs.contains(dst) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nicely avoids allocating the PathBuf
if the directory is already present 👍
@QuietMisdreavus Thanks a lot for updating the PR. I think it's worth keeping things simple. @bors r+ |
📌 Commit 2c9d452 has been approved by |
There's still the option of implementing a pool of re-usable buffers which should have pretty much the same performance characteristics but without the architectural downsides. |
some low-hanging rustdoc optimizations There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.
☀️ Test successful - status-appveyor, status-travis |
There were a few discussions earlier today in #rust-internals about the syscall usage and overall performance of rustdoc. This PR is intended to pick some low-hanging fruit and try to rein in some of the performance issues of rustdoc.