Skip to content

feat(sync): remote heads tracking for confirmed cell mutations#713

Merged
rgbkrk merged 3 commits intomainfrom
feat/remote-heads-tracking
Mar 11, 2026
Merged

feat(sync): remote heads tracking for confirmed cell mutations#713
rgbkrk merged 3 commits intomainfrom
feat/remote-heads-tracking

Conversation

@rgbkrk
Copy link
Member

@rgbkrk rgbkrk commented Mar 11, 2026

Adds sync_to_daemon_confirmed that checks peer_state.shared_heads includes our local heads after a sync round-trip. Cell mutations stay fire-and-forget with sync_to_daemon — confirmation is opt-in at the execution boundary.

Python bindings call confirm_sync() before sending ExecuteCell, fixing the create_cellexecute_cell race. WASM edits are unaffected.

Phase A of the sync improvements plan.

PR submitted by @rgbkrk's agent Quill, via Zed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens notebook sync correctness by making cell mutation APIs wait for daemon-side confirmation that local Automerge heads have been merged, addressing a create_cellexecute_cell race in the document-first architecture.

Changes:

  • Switched key cell mutation methods (add_cell, add_cell_with_source, delete_cell, move_cell, update_source, append_source) to use a new confirmed sync helper.
  • Added sync_to_daemon_confirmed(), which performs additional sync rounds until peer_state.shared_heads includes the local heads (bounded retries), then logs if confirmation isn’t reached.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +2136 to +2145
// Best-effort: even if not fully confirmed after retries, the
// changes are very likely applied. Log and continue rather than
// failing the mutation — a hard error here would be worse than
// the original race.
log::debug!(
"[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})",
our_heads.len(),
self.peer_state.shared_heads.len()
);
Ok(())
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_to_daemon_confirmed returns Ok(()) even when the daemon never confirms our heads (after retries it only logs). That undermines the method’s stated goal and can still allow the create_cellexecute_cell race when confirmation doesn’t arrive in time. Consider returning an error (e.g., NotebookSyncError::Timeout/SyncError) or changing the API to return a boolean so callers can decide whether to proceed with execute requests when confirmation fails.

Suggested change
// Best-effort: even if not fully confirmed after retries, the
// changes are very likely applied. Log and continue rather than
// failing the mutation — a hard error here would be worse than
// the original race.
log::debug!(
"[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})",
our_heads.len(),
self.peer_state.shared_heads.len()
);
Ok(())
// We failed to observe the daemon confirming all of our heads within
// the bounded number of sync rounds. This means we cannot uphold the
// contract that this method only returns `Ok(())` once our changes are
// confirmed, so surface this as an error to the caller.
log::debug!(
"[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})",
our_heads.len(),
self.peer_state.shared_heads.len()
);
Err(NotebookSyncError::SyncError(
"sync_to_daemon_confirmed: heads not fully confirmed after retries".into(),
))

Copilot uses AI. Check for mistakes.
// Not yet confirmed — do another sync round
self.sync_to_daemon().await?;
}

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry loop checks shared_heads before doing the next sync round. If confirmation only becomes true on the last sync_to_daemon() call inside the loop, it won’t be re-checked and the function will fall through to the "not fully confirmed" path. Consider restructuring so each retry performs a sync and then re-checks (or add a final check after the loop).

Suggested change
// Final check after the last sync attempt in case confirmation
// only became true on the final round.
let shared = &self.peer_state.shared_heads;
if our_heads.iter().all(|h| shared.contains(h)) {
return Ok(()); // Daemon has confirmed all our changes
}

Copilot uses AI. Check for mistakes.
Comment on lines +2116 to +2123
async fn sync_to_daemon_confirmed(&mut self) -> Result<(), NotebookSyncError> {
// Do the initial sync round
self.sync_to_daemon().await?;

let our_heads = self.doc.get_heads();
if our_heads.is_empty() {
return Ok(()); // Empty doc, nothing to confirm
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new confirmation behavior is core to correctness (prevents create_cellexecute_cell races), but it isn’t covered by tests in this file. Adding a unit/integration test that simulates multiple sync rounds and asserts the method blocks until the expected heads are present in peer_state.shared_heads would help prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +2125 to +2128
// The sync protocol may need multiple rounds. Bound the retries
// so we don't loop forever if something is wrong.
for _ in 0..5 {
let shared = &self.peer_state.shared_heads;
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confirmation retry count (0..5) is a magic number here. Consider extracting it (and possibly a total confirmation timeout) into a named constant/config so it’s easier to reason about and tune alongside sync_to_daemon’s internal 500ms deadline.

Copilot uses AI. Check for mistakes.
@rgbkrk rgbkrk force-pushed the feat/remote-heads-tracking branch from d9db41f to 1d19e7f Compare March 11, 2026 21:09
Add sync_to_daemon_confirmed that checks peer_state.shared_heads
includes our local heads after a sync round-trip, confirming the
daemon has merged our changes. Cell mutations stay fire-and-forget
with sync_to_daemon — confirmation is opt-in at the execution boundary.

Python bindings call confirm_sync() before sending ExecuteCell, fixing
the create_cell → execute_cell race where the daemon hasn't merged the
cell into its doc yet. WASM edits are unaffected — local-first editing
doesn't need confirmation for every keystroke.

Phase A of the sync improvements plan.
@rgbkrk rgbkrk force-pushed the feat/remote-heads-tracking branch from 1d19e7f to 42afac7 Compare March 11, 2026 21:15
rgbkrk added 2 commits March 11, 2026 14:23
stream_execute, queue_cell, and async run() all send ExecuteCell to
the daemon but were missing the confirm_sync barrier. This left the
create-to-execute race open for these APIs.
- Intended for full-peer programmatic clients, not the Tauri pipe path
- Attempts confirmation for a bounded number of rounds
- Degrades gracefully if confirmation doesn't arrive
@rgbkrk rgbkrk marked this pull request as ready for review March 11, 2026 21:36
@rgbkrk rgbkrk merged commit 9418e4a into main Mar 11, 2026
14 checks passed
@rgbkrk rgbkrk deleted the feat/remote-heads-tracking branch March 11, 2026 22:00
@rgbkrk rgbkrk mentioned this pull request Mar 13, 2026
48 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants