Skip to content

Call File::sync_all when finalizing encoding #123880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

saethlin
Copy link
Member

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 13, 2024
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2024
@bors
Copy link
Collaborator

bors commented Apr 13, 2024

⌛ Trying commit ecd7070 with merge cbad6f1...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 13, 2024
Call File::sync_all when finalizing encoding

rust-lang#123352 (comment)

r? `@ghost`
@bors
Copy link
Collaborator

bors commented Apr 13, 2024

☀️ Try build successful - checks-actions
Build commit: cbad6f1 (cbad6f1932658dc842d4e58954219cc93c42a2ef)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (cbad6f1): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
6.7% [1.4%, 11.3%] 7
Regressions ❌
(secondary)
8.9% [4.3%, 11.8%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 6.7% [1.4%, 11.3%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 677.972s -> 678.686s (0.11%)
Artifact size: 316.06 MiB -> 316.05 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2024
@the8472
Copy link
Member

the8472 commented Apr 13, 2024

I'm curious about the motivation. sync_all is about guaranteeing that the kernel side dirty pages and metadata have made it to persistent storage. It's not about ensuring that userspace changes make it to the kernel/become visible to other processes.

@goffrie
Copy link
Contributor

goffrie commented Apr 13, 2024

I'm curious about the motivation.

In the linked issue, a user repeatedly rebooted a VM, presumably causing the filesystem to lose non-fsynced (i.e. sync_all) data and thereby causing rustc to crash. Another user reported the same problem after a power failure.

In general the safe-save pattern (i.e. write a new file, then move it over an existing file) is not safe in the presence of unexpected shutdown without an fsync after the new file's contents are written. So adding the sync_all call is legitimate in that sense. This LWN article (from 2009) is relevant: https://lwn.net/Articles/351422/

However, fsync is a very expensive call and this can be seen in the wall-time perf results (somewhere between 5 and 20 milliseconds). Other hardware and OSes most likely perform even worse. (Not to mention that calling fsync increases write amplification for flash storage which reduces the longevity of the device.)
(I think this PR also performs more fsyncs than are strictly needed as FileEncoder is also used in other use cases that may not need the written files to be durably persisted.)

IMO, trying to keep build products consistent in the face of unclean shutdowns is a losing game. Lots of other files get generated during a build, by many different tools (linkers? build scripts?), and those generally don't call fsync either. So a system crash could still end up with corrupt build products. They may fail with other cryptic errors rather than a rustc ICE, but the end result for the user is still the same - the build will fail unless they run cargo clean.

@the8472
Copy link
Member

the8472 commented Apr 13, 2024

In general the safe-save pattern (i.e. write a new file, then move it over an existing file) is not safe in the presence of unexpected shutdown without an fsync after the new file's contents are written.

In fact my aim with #82047 was to ensure that metadata writes are as non-durable as they can be. This is very useful for repeated rebuilds or short-lived things like UI tests and doctests where the intermediate files are thrown away or replaced soon after they have been created. By avoiding all syncing they only stay in the page-cache and no IO-waits are incurred.

IMO, trying to keep build products consistent in the face of unclean shutdowns is a losing game

Agreed. Plus there are other sources of corruption, especially on non-checksumming filesystems.
A better approach is to detect the issue and clean the build dir.

@saethlin
Copy link
Member Author

A better approach is to detect the issue and clean the build dir.

I agree that a better approach is to detect the problem. The trouble is actually detecting the problem in a way that doesn't incorrectly detect a compiler issue. Until quite recently, reproducible compiler bugs produced a lot of corrupt output files, and nobody else debugged that. I don't think that accidentally papering over a family of compiler bugs in an attempt to be resilient to misbehaving filesystems or operating systems is a good outcome.

So I'm going to close this and see what I can come up with. Ideally we'd have some sort of telemetry, but maybe a future diagnostic can ask people to comment on an issue.

@saethlin saethlin closed this Apr 19, 2024
@saethlin saethlin deleted the sync-on-finish branch April 19, 2024 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants