Skip to content

remote: preserve lost action inputs#29548

Open
sluongng wants to merge 1 commit into
bazelbuild:masterfrom
sluongng:sluongng/action-rewind-fix
Open

remote: preserve lost action inputs#29548
sluongng wants to merge 1 commit into
bazelbuild:masterfrom
sluongng:sluongng/action-rewind-fix

Conversation

@sluongng
Copy link
Copy Markdown
Contributor

@sluongng sluongng commented May 15, 2026

A Bazel 9.x remote spawn can fail with "Missing digest" during Merkle
tree input upload and report a generic remote cache failure instead of
a lost input. This happens because upload failures are de-duplicated by
digest through casUploadCache, even though each caller can identify its
current action's input for the missing digest. Mutating the shared
failure would let concurrent same-digest waiters inherit another
action's input.

Keep shared upload failures digest/path-oriented and annotate
CacheNotFoundExceptions while converting each UploadTask's completion
into a TransferResult, using that subscriber's own retained exec path.
BulkTransferException validates annotated paths against the current
action resolver before reporting lost artifacts and continues skipping
misses that action rewind cannot fix.

This lets valid lost inputs drive action rewinding while avoiding
misattribution when two actions upload the same digest concurrently,
without retaining full ActionInput objects in upload tasks.

@sluongng sluongng force-pushed the sluongng/action-rewind-fix branch 2 times, most recently from 76fb7a3 to a111d06 Compare May 15, 2026 14:37
A Bazel 9.x remote spawn can fail with "Missing digest" during Merkle
tree input upload and report a generic remote cache failure instead of
a lost input. This happens because upload failures are de-duplicated by
digest through casUploadCache, even though each caller can identify its
current action's input for the missing digest. Mutating the shared
failure would let concurrent same-digest waiters inherit another
action's input.

Keep shared upload failures digest/path-oriented and annotate
CacheNotFoundExceptions while converting each UploadTask's completion
into a TransferResult, using that subscriber's own retained exec path.
BulkTransferException validates annotated paths against the current
action resolver before reporting lost artifacts and continues skipping
misses that action rewind cannot fix.

This lets valid lost inputs drive action rewinding while avoiding
misattribution when two actions upload the same digest concurrently,
without retaining full ActionInput objects in upload tasks.
@sluongng sluongng force-pushed the sluongng/action-rewind-fix branch from a111d06 to 070357c Compare May 15, 2026 14:55
@sluongng sluongng marked this pull request as ready for review May 15, 2026 15:02
@sluongng sluongng requested a review from a team as a code owner May 15, 2026 15:02
@github-actions github-actions Bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant