Add lock for NFS write #1475

jakubno · 2025-11-11T16:07:16Z

Check that the lock works as expected (logs were removed afterwards)

Left is with locking (cache NFS was cleaned and orchestrator redeployed, so both caches were clean)

Without the lock the same load timed out.

Note

Introduce a file-based lock with TTL and integrate it into cache writes to prevent concurrent NFS writes; add comprehensive tests.

Storage / Locking:
- New packages/shared/pkg/storage/lock: Implement file-based lock with TTL cleanup (TryAcquireLock, ReleaseLock, ErrLockAlreadyHeld).
- Tests: Add concurrency, staleness, and path consistency tests for the lock.
Cache Providers:
- CachedObjectProvider: Wrap full-file cache writes with lock acquisition; skip on ErrLockAlreadyHeld; add warnings.
- CachedSeekableObjectProvider:
  - Lock around chunk writes in writeChunkToCache.
  - Lock around size writes in writeLocalSize.
  - Skip writes if lock already held; emit warnings on lock errors.

^{Written by Cursor Bugbot for commit cc765fc. This will update automatically on new commits. Configure here.}

linear · 2025-11-11T16:07:20Z

ENG-3293 Implement locking mechanism for NFS cache to prevent usage spike

packages/shared/pkg/storage/storage_cache_object.go

packages/shared/pkg/storage/storage_cache_seekable.go

jakubno · 2025-11-11T16:19:33Z

bugbot run

packages/shared/pkg/storage/lock/file_lock.go

ValentaTomas · 2025-11-11T16:26:12Z

I assume this PR is intended to prevent the avalanche of requests when we start multiple orchestrators?

ValentaTomas · 2025-11-11T16:26:39Z

Was the problem with the NFS then throughput or IOPS?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

packages/shared/pkg/storage/lock/file_lock.go

packages/shared/pkg/storage/lock/file_lock_test.go

packages/shared/pkg/storage/lock/file_lock.go

ValentaTomas · 2025-11-11T22:44:01Z

Consideration—with the way the cleanup script picks files, can it delete the lock files?

ValentaTomas · 2025-11-11T22:45:11Z

Alternative—can we make the writes/reads to NFS timeout if they are taking too long, and just fetch from storage instead?

ValentaTomas · 2025-11-11T22:47:42Z

Alternative—if we limit the concurrency of writing things to cache per orchestrator, can the usage reasonably flatten (until we have too many orchestrators)?

jakubno · 2025-11-12T08:37:23Z

Consideration—with the way the cleanup script picks files, can it delete the lock files?

yes

jakubno · 2025-11-12T08:37:59Z

Alternative—can we make the writes/reads to NFS timeout if they are taking too long, and just fetch from storage instead?

yes, but that's a different problem, you want to prevent the requests in the first place

jakubno · 2025-11-12T08:38:58Z

Alternative—if we limit the concurrency of writing things to cache per orchestrator, can the usage reasonably flatten (until we have too many orchestrators)?

This is caused by the concurrency, the traffic from POV of one of the nodes is completely reasonable (and legit)

packages/shared/pkg/storage/lock/file_lock_test.go

Add lock for NFS write

ec23669

jakubno requested review from ValentaTomas and dobrac as code owners November 11, 2025 16:07

jakubno added the improvement Improvement for current functionality label Nov 11, 2025

e2b-request-same-site-reviewers bot assigned dobrac Nov 11, 2025

jakubno marked this pull request as draft November 11, 2025 16:08

This comment was marked as resolved.

Sign in to view

cursor bot reviewed Nov 11, 2025

View reviewed changes

packages/shared/pkg/storage/storage_cache_object.go Show resolved Hide resolved

packages/shared/pkg/storage/storage_cache_seekable.go Show resolved Hide resolved

packages/shared/pkg/storage/storage_cache_seekable.go Show resolved Hide resolved

Fix the lock path

f2dbf4e

cursor bot reviewed Nov 11, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock.go Outdated Show resolved Hide resolved

packages/shared/pkg/storage/lock/file_lock.go Show resolved Hide resolved

Lint

b7539e2

jakubno added 4 commits November 11, 2025 17:37

Clean up

5666fc5

Fix test

8a44b16

Fix test

b86184a

Simplify

cefd6f4

jakubno marked this pull request as ready for review November 11, 2025 18:52

chatgpt-codex-connector bot reviewed Nov 11, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock.go Show resolved Hide resolved

cursor bot reviewed Nov 11, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock.go Show resolved Hide resolved

packages/shared/pkg/storage/lock/file_lock.go Show resolved Hide resolved

packages/shared/pkg/storage/lock/file_lock_test.go Show resolved Hide resolved

jakubno added 3 commits November 11, 2025 20:03

Remove extra logs

9100e6a

Fix releasing in test

bb55d91

Increase the number of routines in the test

6799b58

cursor bot reviewed Nov 11, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock.go Show resolved Hide resolved

Add a comment

349061d

dobrac approved these changes Nov 12, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock_test.go Outdated Show resolved Hide resolved

jakubno commented Nov 12, 2025

View reviewed changes

packages/shared/pkg/storage/lock/file_lock_test.go Outdated Show resolved Hide resolved

Apply suggestion from @jakubno

cc765fc

jakubno merged commit 553a842 into main Nov 12, 2025
28 checks passed

jakubno deleted the implement-locking-mechanism-for-nfs-cache-to-prevent-usage-eng-3293 branch November 12, 2025 20:51

Add lock for NFS write #1475

Add lock for NFS write #1475

Uh oh!

Conversation

jakubno commented Nov 11, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Nov 11, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakubno commented Nov 11, 2025

Uh oh!

Uh oh!

Uh oh!

ValentaTomas commented Nov 11, 2025

Uh oh!

ValentaTomas commented Nov 11, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ValentaTomas commented Nov 11, 2025

Uh oh!

ValentaTomas commented Nov 11, 2025

Uh oh!

ValentaTomas commented Nov 11, 2025

Uh oh!

jakubno commented Nov 12, 2025

Uh oh!

jakubno commented Nov 12, 2025

Uh oh!

jakubno commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jakubno commented Nov 11, 2025 •

edited by cursor bot

Loading

jakubno commented Nov 12, 2025 •

edited

Loading