-
Notifications
You must be signed in to change notification settings - Fork 5k
unnecessary fsync in perf tracing code #13812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
When profiling a single app in development it is not necessary. When collecting a system wide profile of long-running processes it is more important, because the flush is not guaranteed to happen until the process exits. This means that when profiling a long-running process, I cannot completely analyze the system until I terminate the .NET Core apps.
👍 Also the switch to I suspect a dedicated jitdump file write thread might be the best answer for minimal perf impact. The iovec structs could easily be passed through a concurrent queue to the writing thread. The locking mechanism could be the atomic write to the queue. This would probably reduce the impact to the JIT threads. The perf thread id could be used to ignore the jitdump file io, fsync... |
👍
We need the retry loop for signal handlers here.... |
This really should only impact app startup. To get a good profile of long term performance you may need a significantly longer profile. #13539 would also help eliminate startup anomalies. |
Looking at the posix standard the I was wondering if using |
The spec is not immediately clear to me that the I will remove the |
I think there is no reason for kernel to hold the buffers once they cross to kernel boundary. It may not happen immediately as kernel is trying to optimize writes to disk. AFAIK, data will be visible to other process as soon as they hit disk cache even if they are not on physical surface. |
Removing the fsync and other changes made perf record much faster. While user and system time increased slightly, the elapsed time decreased dramatically. In the single benchmark the elapsed time dropped from about 2 minutes to about 2 seconds.... |
I was trying to profile some networking code on Linux using perfollect. I noticed large chunk of time spent in fsync and related code. In my test code of 200 SslHandshake I saw 7000+
fsync
calls.After some digging this points to
runtime/src/coreclr/src/pal/src/misc/perfjitdump.cpp
Lines 225 to 281 in 7b408c1
fsync
seems unnecessary as we don't expect correct behavior on abnormal exits https://github.com/dotnet/diagnostics/issues/570. Whenwrite()
is done, data will be written even if .NET process crashes as OS will close all handles and flush all buffers.To make this worse, the
fsync
is currently done under lock and that can significanly increase critical section.There seems to be other issues with this code. We only check for negative value returned from
write
. Howeverwrite
can write less then asked for and return written bytes. We would miss such case and we could write corrupted entries unnoticed.Since we write fixed number records we should be able to to have iovec on stack and use single
writev()
That would limit context switches to kernel and it would make this less impactful to traced code.cc @sdmaclea @janvorli @noahfalk @jkotas
The text was updated successfully, but these errors were encountered: