feat(sync): remote heads tracking for confirmed cell mutations#713
feat(sync): remote heads tracking for confirmed cell mutations#713
Conversation
There was a problem hiding this comment.
Pull request overview
This PR strengthens notebook sync correctness by making cell mutation APIs wait for daemon-side confirmation that local Automerge heads have been merged, addressing a create_cell → execute_cell race in the document-first architecture.
Changes:
- Switched key cell mutation methods (
add_cell,add_cell_with_source,delete_cell,move_cell,update_source,append_source) to use a new confirmed sync helper. - Added
sync_to_daemon_confirmed(), which performs additional sync rounds untilpeer_state.shared_headsincludes the local heads (bounded retries), then logs if confirmation isn’t reached.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| // Best-effort: even if not fully confirmed after retries, the | ||
| // changes are very likely applied. Log and continue rather than | ||
| // failing the mutation — a hard error here would be worse than | ||
| // the original race. | ||
| log::debug!( | ||
| "[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})", | ||
| our_heads.len(), | ||
| self.peer_state.shared_heads.len() | ||
| ); | ||
| Ok(()) |
There was a problem hiding this comment.
sync_to_daemon_confirmed returns Ok(()) even when the daemon never confirms our heads (after retries it only logs). That undermines the method’s stated goal and can still allow the create_cell → execute_cell race when confirmation doesn’t arrive in time. Consider returning an error (e.g., NotebookSyncError::Timeout/SyncError) or changing the API to return a boolean so callers can decide whether to proceed with execute requests when confirmation fails.
| // Best-effort: even if not fully confirmed after retries, the | |
| // changes are very likely applied. Log and continue rather than | |
| // failing the mutation — a hard error here would be worse than | |
| // the original race. | |
| log::debug!( | |
| "[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})", | |
| our_heads.len(), | |
| self.peer_state.shared_heads.len() | |
| ); | |
| Ok(()) | |
| // We failed to observe the daemon confirming all of our heads within | |
| // the bounded number of sync rounds. This means we cannot uphold the | |
| // contract that this method only returns `Ok(())` once our changes are | |
| // confirmed, so surface this as an error to the caller. | |
| log::debug!( | |
| "[notebook-sync-client] sync_to_daemon_confirmed: heads not fully confirmed after retries (our_heads={}, shared_heads={})", | |
| our_heads.len(), | |
| self.peer_state.shared_heads.len() | |
| ); | |
| Err(NotebookSyncError::SyncError( | |
| "sync_to_daemon_confirmed: heads not fully confirmed after retries".into(), | |
| )) |
| // Not yet confirmed — do another sync round | ||
| self.sync_to_daemon().await?; | ||
| } | ||
|
|
There was a problem hiding this comment.
The retry loop checks shared_heads before doing the next sync round. If confirmation only becomes true on the last sync_to_daemon() call inside the loop, it won’t be re-checked and the function will fall through to the "not fully confirmed" path. Consider restructuring so each retry performs a sync and then re-checks (or add a final check after the loop).
| // Final check after the last sync attempt in case confirmation | |
| // only became true on the final round. | |
| let shared = &self.peer_state.shared_heads; | |
| if our_heads.iter().all(|h| shared.contains(h)) { | |
| return Ok(()); // Daemon has confirmed all our changes | |
| } |
| async fn sync_to_daemon_confirmed(&mut self) -> Result<(), NotebookSyncError> { | ||
| // Do the initial sync round | ||
| self.sync_to_daemon().await?; | ||
|
|
||
| let our_heads = self.doc.get_heads(); | ||
| if our_heads.is_empty() { | ||
| return Ok(()); // Empty doc, nothing to confirm | ||
| } |
There was a problem hiding this comment.
This new confirmation behavior is core to correctness (prevents create_cell → execute_cell races), but it isn’t covered by tests in this file. Adding a unit/integration test that simulates multiple sync rounds and asserts the method blocks until the expected heads are present in peer_state.shared_heads would help prevent regressions.
| // The sync protocol may need multiple rounds. Bound the retries | ||
| // so we don't loop forever if something is wrong. | ||
| for _ in 0..5 { | ||
| let shared = &self.peer_state.shared_heads; |
There was a problem hiding this comment.
The confirmation retry count (0..5) is a magic number here. Consider extracting it (and possibly a total confirmation timeout) into a named constant/config so it’s easier to reason about and tune alongside sync_to_daemon’s internal 500ms deadline.
d9db41f to
1d19e7f
Compare
Add sync_to_daemon_confirmed that checks peer_state.shared_heads includes our local heads after a sync round-trip, confirming the daemon has merged our changes. Cell mutations stay fire-and-forget with sync_to_daemon — confirmation is opt-in at the execution boundary. Python bindings call confirm_sync() before sending ExecuteCell, fixing the create_cell → execute_cell race where the daemon hasn't merged the cell into its doc yet. WASM edits are unaffected — local-first editing doesn't need confirmation for every keystroke. Phase A of the sync improvements plan.
1d19e7f to
42afac7
Compare
stream_execute, queue_cell, and async run() all send ExecuteCell to the daemon but were missing the confirm_sync barrier. This left the create-to-execute race open for these APIs.
- Intended for full-peer programmatic clients, not the Tauri pipe path - Attempts confirmation for a bounded number of rounds - Degrades gracefully if confirmation doesn't arrive
Adds
sync_to_daemon_confirmedthat checkspeer_state.shared_headsincludes our local heads after a sync round-trip. Cell mutations stay fire-and-forget withsync_to_daemon— confirmation is opt-in at the execution boundary.Python bindings call
confirm_sync()before sendingExecuteCell, fixing thecreate_cell→execute_cellrace. WASM edits are unaffected.Phase A of the sync improvements plan.
PR submitted by @rgbkrk's agent Quill, via Zed