-
Notifications
You must be signed in to change notification settings - Fork 660
Fix locktuil on windows #2729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix locktuil on windows #2729
Conversation
Verify that the lock works for threads in the same process. The test fails on windows: === RUN TestWithDirLock lockutil_test.go:54: Expected one writer, got [writer 19 writer 0] (Keeping as separate commit to make it easy to reproduce) Signed-off-by: Nir Soffer <[email protected]>
We use LOCKFILE_FAIL_IMMEDIATELY, creating a shared lock that fail immediately if the lock cannot be acquired. This does not match the behavior of the unix version and does not make sense. Using now LOCKFILE_EXCLUSIVE_LOCK to acquire exclusive lock blocking until the lock can be acquired. Add constants for the flags and add comments for the argument names to prevent future errors. Signed-off-by: Nir Soffer <[email protected]>
const ( | ||
// see https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx | ||
LOCKFILE_EXCLUSIVE_LOCK = 0x00000002 | ||
LOCKFILE_FAIL_IMMEDIATELY = 0x00000001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the same constant names used in windows docs to make it easy to find the flag in the docs or using search.
I see that the original code form bolt use nicer go style:
https://github.com/boltdb/bolt/blob/2f1ce7a837dcb8da3ec595b1dac9d0632f0f99e8/bolt_windows.go#L18
We can use this style.
Ad sub tests for parallel downloads with "with cache" and "cahing-only mode". The test run 10 concurrent downloads with the current test http server and ensure that downloads succeeded. To make it possible to test using the same process, we include now a counter in the perProcessTempfile. The file name is now: {filename}.pid.{count} The test revealed an issue with clonefile, not seen when testing 3 parallel `limactl create`. When the data file is replaced during a clonefile syscall, it may fail with: clonefile failed: no such file or directory This smells like a darwin bug, but we can avoid this issue by using the first download result. When a download finishes, we check if data file exists, and return success if it does. We take a lock for the very short time needed to check and rename the temporary data file to the target file. The parallel tests failed on windows - it seems that os.Rename() does not work on windows if the target exist. Replacing atomicWrite with a simpler version that takes a lock and write the file fixes the issue. Tested using: % go test ./pkg/downloader -run 'TestDownloadRemote/with_cache/parallel' -count=1000 ok github.com/lima-vm/lima/pkg/downloader 116.025s % go test ./pkg/downloader -run 'TestDownloadRemote/caching-only_mode/parallel' -count=1000 ok github.com/lima-vm/lima/pkg/downloader 98.535s Notes: - Depends on lima-vm#2729, fixing the lockutil windows implementation. Signed-off-by: Nir Soffer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Ad sub tests for parallel downloads with "with cache" and "cahing-only mode". The test run 10 concurrent downloads with the current test http server and ensure that downloads succeeded. To make it possible to test using the same process, we include now a counter in the perProcessTempfile. The file name is now: {filename}.pid.{count} The test revealed an issue with clonefile, not seen when testing 3 parallel `limactl create`. When the data file is replaced during a clonefile syscall, it may fail with: clonefile failed: no such file or directory This smells like a darwin bug, but we can avoid this issue by using the first download result. When a download finishes, we check if data file exists, and return success if it does. We take a lock for the very short time needed to check and rename the temporary data file to the target file. The parallel tests failed on windows - it seems that os.Rename() does not work on windows if the target exist. Replacing atomicWrite with a simpler version that takes a lock and write the file fixes the issue. Tested using: % go test ./pkg/downloader -run 'TestDownloadRemote/with_cache/parallel' -count=1000 ok github.com/lima-vm/lima/pkg/downloader 116.025s % go test ./pkg/downloader -run 'TestDownloadRemote/caching-only_mode/parallel' -count=1000 ok github.com/lima-vm/lima/pkg/downloader 98.535s Notes: - Depends on lima-vm#2729, fixing the lockutil windows implementation. Signed-off-by: Nir Soffer <[email protected]>
Add the missing test, revealing that the windows implementation does not match the unix one. It allows multiple threads to acquire the lock at the same time.
Fix the lock flags to create a blocking exclusive lock, and improve the code to prevent future errors.
Tested using the unit tests.
@AkihiroSuda Looks like this code was copied from nerdctl and it needs the same test and fix:
https://github.com/containerd/nerdctl/tree/v0.13.0/pkg/lockutil