Call File::sync_all when finalizing encoding #123880

saethlin · 2024-04-13T01:16:40Z

#123352 (comment)

r? @ghost

saethlin · 2024-04-13T01:16:48Z

@bors try @rust-timer queue

bors · 2024-04-13T01:19:10Z

⌛ Trying commit ecd7070 with merge cbad6f1...

Call File::sync_all when finalizing encoding rust-lang#123352 (comment) r? `@ghost`

bors · 2024-04-13T02:52:07Z

☀️ Try build successful - checks-actions
Build commit: cbad6f1 (cbad6f1932658dc842d4e58954219cc93c42a2ef)

rust-timer · 2024-04-13T08:06:22Z

Finished benchmarking commit (cbad6f1): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.7%	[1.4%, 11.3%]	7
Regressions ❌ (secondary)	8.9%	[4.3%, 11.8%]	8
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	6.7%	[1.4%, 11.3%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.0%	[2.0%, 2.0%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 677.972s -> 678.686s (0.11%)
Artifact size: 316.06 MiB -> 316.05 MiB (-0.00%)

the8472 · 2024-04-13T10:51:24Z

I'm curious about the motivation. sync_all is about guaranteeing that the kernel side dirty pages and metadata have made it to persistent storage. It's not about ensuring that userspace changes make it to the kernel/become visible to other processes.

goffrie · 2024-04-13T18:09:25Z

I'm curious about the motivation.

In the linked issue, a user repeatedly rebooted a VM, presumably causing the filesystem to lose non-fsynced (i.e. sync_all) data and thereby causing rustc to crash. Another user reported the same problem after a power failure.

In general the safe-save pattern (i.e. write a new file, then move it over an existing file) is not safe in the presence of unexpected shutdown without an fsync after the new file's contents are written. So adding the sync_all call is legitimate in that sense. This LWN article (from 2009) is relevant: https://lwn.net/Articles/351422/

However, fsync is a very expensive call and this can be seen in the wall-time perf results (somewhere between 5 and 20 milliseconds). Other hardware and OSes most likely perform even worse. (Not to mention that calling fsync increases write amplification for flash storage which reduces the longevity of the device.)
(I think this PR also performs more fsyncs than are strictly needed as FileEncoder is also used in other use cases that may not need the written files to be durably persisted.)

IMO, trying to keep build products consistent in the face of unclean shutdowns is a losing game. Lots of other files get generated during a build, by many different tools (linkers? build scripts?), and those generally don't call fsync either. So a system crash could still end up with corrupt build products. They may fail with other cryptic errors rather than a rustc ICE, but the end result for the user is still the same - the build will fail unless they run cargo clean.

the8472 · 2024-04-13T18:46:01Z

In general the safe-save pattern (i.e. write a new file, then move it over an existing file) is not safe in the presence of unexpected shutdown without an fsync after the new file's contents are written.

In fact my aim with #82047 was to ensure that metadata writes are as non-durable as they can be. This is very useful for repeated rebuilds or short-lived things like UI tests and doctests where the intermediate files are thrown away or replaced soon after they have been created. By avoiding all syncing they only stay in the page-cache and no IO-waits are incurred.

IMO, trying to keep build products consistent in the face of unclean shutdowns is a losing game

Agreed. Plus there are other sources of corruption, especially on non-checksumming filesystems.
A better approach is to detect the issue and clean the build dir.

saethlin · 2024-04-19T18:41:47Z

A better approach is to detect the issue and clean the build dir.

I agree that a better approach is to detect the problem. The trouble is actually detecting the problem in a way that doesn't incorrectly detect a compiler issue. Until quite recently, reproducible compiler bugs produced a lot of corrupt output files, and nobody else debugged that. I don't think that accidentally papering over a family of compiler bugs in an attempt to be resilient to misbehaving filesystems or operating systems is a good outcome.

So I'm going to close this and see what I can come up with. Ideally we'd have some sort of telemetry, but maybe a future diagnostic can ask people to comment on an issue.

Call File::sync_all when finalizing encoding

ecd7070

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 13, 2024

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2024

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 13, 2024

Auto merge of rust-lang#123880 - saethlin:sync-on-finish, r=<try>

cbad6f1

Call File::sync_all when finalizing encoding rust-lang#123352 (comment) r? `@ghost`

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 13, 2024

saethlin closed this Apr 19, 2024

saethlin deleted the sync-on-finish branch April 19, 2024 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call File::sync_all when finalizing encoding #123880

Call File::sync_all when finalizing encoding #123880

saethlin commented Apr 13, 2024

saethlin commented Apr 13, 2024

This comment has been minimized.

bors commented Apr 13, 2024

bors commented Apr 13, 2024

This comment has been minimized.

rust-timer commented Apr 13, 2024

the8472 commented Apr 13, 2024

goffrie commented Apr 13, 2024

the8472 commented Apr 13, 2024 •

edited

Loading

saethlin commented Apr 19, 2024

Call File::sync_all when finalizing encoding #123880

Call File::sync_all when finalizing encoding #123880

Conversation

saethlin commented Apr 13, 2024

saethlin commented Apr 13, 2024

This comment has been minimized.

bors commented Apr 13, 2024

bors commented Apr 13, 2024

This comment has been minimized.

rust-timer commented Apr 13, 2024

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

the8472 commented Apr 13, 2024

goffrie commented Apr 13, 2024

the8472 commented Apr 13, 2024 • edited Loading

saethlin commented Apr 19, 2024

the8472 commented Apr 13, 2024 •

edited

Loading