-
Notifications
You must be signed in to change notification settings - Fork 219
Add lock for NFS write #1475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lock for NFS write #1475
Conversation
|
bugbot run |
|
I assume this PR is intended to prevent the avalanche of requests when we start multiple orchestrators? |
|
Was the problem with the NFS then throughput or IOPS? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Consideration—with the way the cleanup script picks files, can it delete the lock files? |
|
Alternative—can we make the writes/reads to NFS timeout if they are taking too long, and just fetch from storage instead? |
|
Alternative—if we limit the concurrency of writing things to cache per orchestrator, can the usage reasonably flatten (until we have too many orchestrators)? |
yes |
yes, but that's a different problem, you want to prevent the requests in the first place |
This is caused by the concurrency, the traffic from POV of one of the nodes is completely reasonable (and legit) |
Check that the lock works as expected (logs were removed afterwards)

Left is with locking (cache NFS was cleaned and orchestrator redeployed, so both caches were clean)
Without the lock the same load timed out.
Note
Introduce a file-based lock with TTL and integrate it into cache writes to prevent concurrent NFS writes; add comprehensive tests.
packages/shared/pkg/storage/lock: Implement file-based lock with TTL cleanup (TryAcquireLock,ReleaseLock,ErrLockAlreadyHeld).CachedObjectProvider: Wrap full-file cache writes with lock acquisition; skip onErrLockAlreadyHeld; add warnings.CachedSeekableObjectProvider:writeChunkToCache.writeLocalSize.Written by Cursor Bugbot for commit cc765fc. This will update automatically on new commits. Configure here.