feat(notebook-doc): native Automerge metadata (no more JSON string blobs)#785
feat(notebook-doc): native Automerge metadata (no more JSON string blobs)#785
Conversation
Adds put_json_value/get_json_value for converting between serde_json::Value and native Automerge types: - JSON objects → Automerge Maps - JSON arrays → Automerge Lists - Scalars → native Automerge scalars (string, bool, i64, f64) - null → delete key (maps) or empty string placeholder (lists) Also adds set_metadata_value/get_metadata_value convenience methods for notebook-level metadata using native types. Foundation for #765 — migrating metadata from JSON string blobs to native Automerge types for proper CRDT merging.
set_metadata_snapshot now writes kernelspec, language_info, and runt as separate native Automerge maps/lists instead of one JSON blob. get_metadata_snapshot reads native keys first, falls back to legacy "notebook_metadata" string for backward compat. Dual-write: both native keys AND legacy string are written, so old daemons can still read new docs. The legacy key will be removed once all consumers are updated. This enables per-field CRDT merging — two clients adding different UV dependencies concurrently will both land (Automerge list merge) instead of one clobbering the other (LWW on JSON string).
There was a problem hiding this comment.
Pull request overview
Introduces foundational primitives in notebook-doc to convert between serde_json::Value and native Automerge types (Maps/Lists/scalars), enabling CRDT-friendly notebook metadata storage as a prerequisite to migrating away from JSON-string “blob” metadata.
Changes:
- Add recursive
put_json_value/get_json_valuehelpers to write/read JSON structures as native Automerge objects. - Add
set_metadata_value/get_metadata_valueto store top-level notebook metadata keys as native Automerge subtrees. - Add
scalar_to_jsonhelper for reconstructing JSON values from Automerge scalars.
Comments suppressed due to low confidence (2)
crates/notebook-doc/src/lib.rs:1347
- The docstring is out of sync with the implementation: booleans are being stored as native Automerge booleans (not string "true"/"false"), and numbers are stored as i64/f64 (not string representations). Please update the documentation to match the actual on-disk representation to avoid confusing callers and future migrations.
if !resolved_assets.contains_key(key) {
self.doc.delete(&resolved_assets_id, key)?;
}
}
crates/notebook-doc/src/lib.rs:1369
Value::Numberhandling currently only writes i64 or f64. This can lose precision for large unsigned integers (e.g. u64 > i64::MAX becomes f64) or even silently write nothing if neither branch matches, leaving the previous value intact. Consider handlingas_u64()explicitly (and/or returning an error if the number cannot be represented) so numeric metadata round-trips correctly.
read_str(&self.doc, meta_id, key)
}
/// Set a metadata value.
pub fn set_metadata(&mut self, key: &str, value: &str) -> Result<(), AutomergeError> {
let meta_id = match self.metadata_map_id() {
Some(id) => id,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| /// | ||
| /// If the key already exists, it is replaced (the old object is overwritten). | ||
| pub fn put_json_value( | ||
| &mut self, |
| ) -> Result<(), AutomergeError> { | ||
| match value { | ||
| serde_json::Value::Null => { | ||
| self.doc.delete(parent, key)?; | ||
| } | ||
| serde_json::Value::Bool(b) => { | ||
| self.doc.put(parent, key, *b)?; |
| .put_object(&cell_obj, "resolved_assets", ObjType::Map)?, | ||
| }; | ||
|
|
||
| for key in existing.keys() { | ||
| if !resolved_assets.contains_key(key) { | ||
| self.doc.delete(&resolved_assets_id, key)?; | ||
| } | ||
| } | ||
|
|
||
| for (asset_ref, blob_hash) in resolved_assets { | ||
| if existing.get(asset_ref) != Some(blob_hash) { | ||
| self.doc.put(&resolved_assets_id, asset_ref, blob_hash)?; | ||
| } | ||
| } | ||
|
|
||
| Ok(true) | ||
| } | ||
|
|
||
| // ── Notebook metadata ────────────────────────────────────────── | ||
|
|
||
| /// Read a metadata value. | ||
| pub fn get_metadata(&self, key: &str) -> Option<String> { | ||
| let meta_id = self.metadata_map_id()?; | ||
| read_str(&self.doc, meta_id, key) | ||
| } | ||
|
|
||
| /// Set a metadata value. | ||
| pub fn set_metadata(&mut self, key: &str, value: &str) -> Result<(), AutomergeError> { | ||
| let meta_id = match self.metadata_map_id() { | ||
| Some(id) => id, | ||
| None => { | ||
| // Create metadata map if missing | ||
| let id = self | ||
| .doc | ||
| .put_object(automerge::ROOT, "metadata", ObjType::Map)?; | ||
| self.doc.put(&id, key, value)?; | ||
| return Ok(()); | ||
| } | ||
| }; | ||
| self.doc.put(&meta_id, key, value)?; | ||
| Ok(()) | ||
| } | ||
|
|
||
| // ── Native JSON ↔ Automerge ───────────────────────────────────── | ||
| // | ||
| // These primitives convert between serde_json::Value and native | ||
| // Automerge objects. JSON objects become Maps, arrays become Lists, | ||
| // and scalars become scalars. No JSON-encoded string blobs. | ||
|
|
||
| /// Write a `serde_json::Value` into the Automerge doc as native types. | ||
| /// | ||
| /// - `Value::Object` → `ObjType::Map` with recursive children | ||
| /// - `Value::Array` → `ObjType::List` with recursive elements | ||
| /// - `Value::String` → string scalar | ||
| /// - `Value::Bool` → bool scalar (stored as string "true"/"false" for now) | ||
| /// - `Value::Number` → string representation (Automerge scalars) | ||
| /// - `Value::Null` → deletes the key | ||
| /// | ||
| /// If the key already exists, it is replaced (the old object is overwritten). | ||
| pub fn put_json_value( | ||
| &mut self, | ||
| parent: &ObjId, | ||
| key: &str, | ||
| value: &serde_json::Value, | ||
| ) -> Result<(), AutomergeError> { | ||
| match value { | ||
| serde_json::Value::Null => { | ||
| self.doc.delete(parent, key)?; | ||
| } | ||
| serde_json::Value::Bool(b) => { | ||
| self.doc.put(parent, key, *b)?; | ||
| } | ||
| serde_json::Value::Number(n) => { | ||
| if let Some(i) = n.as_i64() { | ||
| self.doc.put(parent, key, i)?; | ||
| } else if let Some(f) = n.as_f64() { | ||
| self.doc.put(parent, key, f)?; | ||
| } | ||
| } | ||
| serde_json::Value::String(s) => { | ||
| self.doc.put(parent, key, s.as_str())?; | ||
| } | ||
| serde_json::Value::Array(arr) => { | ||
| let list_id = self.doc.put_object(parent, key, ObjType::List)?; | ||
| for (i, elem) in arr.iter().enumerate() { | ||
| self.insert_json_value_at(&list_id, i, elem)?; | ||
| } | ||
| } | ||
| serde_json::Value::Object(map) => { | ||
| let map_id = self.doc.put_object(parent, key, ObjType::Map)?; | ||
| for (k, v) in map { | ||
| self.put_json_value(&map_id, k, v)?; | ||
| } | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| /// Insert a `serde_json::Value` into an Automerge list at the given index. | ||
| fn insert_json_value_at( | ||
| &mut self, | ||
| list_id: &ObjId, | ||
| index: usize, | ||
| value: &serde_json::Value, | ||
| ) -> Result<(), AutomergeError> { | ||
| match value { | ||
| serde_json::Value::Null => { | ||
| // Insert empty string as placeholder for null in lists | ||
| self.doc.insert(list_id, index, "")?; | ||
| } | ||
| serde_json::Value::Bool(b) => { | ||
| self.doc.insert(list_id, index, *b)?; | ||
| } | ||
| serde_json::Value::Number(n) => { | ||
| if let Some(i) = n.as_i64() { | ||
| self.doc.insert(list_id, index, i)?; | ||
| } else if let Some(f) = n.as_f64() { | ||
| self.doc.insert(list_id, index, f)?; | ||
| } | ||
| } | ||
| serde_json::Value::String(s) => { | ||
| self.doc.insert(list_id, index, s.as_str())?; | ||
| } | ||
| serde_json::Value::Array(arr) => { | ||
| let nested_list = self.doc.insert_object(list_id, index, ObjType::List)?; | ||
| for (i, elem) in arr.iter().enumerate() { | ||
| self.insert_json_value_at(&nested_list, i, elem)?; | ||
| } | ||
| } | ||
| serde_json::Value::Object(map) => { | ||
| let nested_map = self.doc.insert_object(list_id, index, ObjType::Map)?; | ||
| for (k, v) in map { | ||
| self.put_json_value(&nested_map, k, v)?; | ||
| } | ||
| } | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| /// Read an Automerge subtree as a `serde_json::Value`. | ||
| /// | ||
| /// Maps become `Value::Object`, Lists become `Value::Array`, | ||
| /// and scalars become the appropriate JSON type. | ||
| /// | ||
| /// Returns `None` if the key doesn't exist. | ||
| pub fn get_json_value(&self, parent: &ObjId, key: &str) -> Option<serde_json::Value> { | ||
| use automerge::Value; | ||
| match self.doc.get(parent, key).ok()?? { | ||
| (Value::Object(ObjType::Map), map_id) => Some(self.read_map_as_json(&map_id)), | ||
| (Value::Object(ObjType::List), list_id) => Some(self.read_list_as_json(&list_id)), | ||
| (Value::Scalar(s), _) => Some(scalar_to_json(&s)), | ||
| _ => None, | ||
| } | ||
| } | ||
|
|
||
| /// Read an Automerge map as a JSON object. | ||
| fn read_map_as_json(&self, map_id: &ObjId) -> serde_json::Value { | ||
| let mut obj = serde_json::Map::new(); | ||
| for key in self.doc.keys(map_id) { | ||
| if let Some(value) = self.get_json_value(map_id, &key) { | ||
| obj.insert(key, value); | ||
| } | ||
| } |
Cell metadata is now stored as native Automerge Maps instead of JSON-encoded strings. This enables per-field CRDT merging: - Two clients setting source_hidden and outputs_hidden concurrently both land (separate Map keys, no conflict) - Tags stored as native Automerge Lists (concurrent tag adds merge) - Extension metadata gets the same native treatment automatically All read paths handle both native Maps (new) and legacy JSON strings (old) for backward compatibility. Also migrates update_cell_metadata_at to navigate and create Automerge Maps directly instead of read-modify-write on a JSON blob. 155 notebook-doc tests pass.
…accessors Replace all raw NOTEBOOK_METADATA_KEY reads with typed accessors: - check_and_broadcast_sync_state: doc.get_metadata_snapshot() - resolve_metadata_snapshot: doc.get_metadata_snapshot() - detect_room_runtime: doc.get_metadata_snapshot() - save_notebook_to_disk: doc.get_metadata_snapshot() - clone save: doc.get_metadata_snapshot() - handshake seeding: doc.set_metadata_snapshot() NOTEBOOK_METADATA_KEY import removed from this file.
…shots The sync task now compares NotebookMetadataSnapshot structs instead of raw JSON strings for change detection. This correctly detects changes to native Automerge keys (runt, kernelspec, language_info) that wouldn't be visible in the legacy 'notebook_metadata' string. Remaining NOTEBOOK_METADATA_KEY usage in sync client: - get_metadata fast-path (still valid for backward compat) - initial_metadata in into_split_with_pipe (handshake compat)
- session_core: document that set_notebook_metadata uses legacy key (dual-write happens at doc layer via set_metadata_snapshot) - sync client: metadata change tracking uses typed snapshots - daemon: all NOTEBOOK_METADATA_KEY reads replaced with typed accessors - notebook-doc: cell metadata stored as native Automerge Maps - 155 notebook-doc tests pass
There was a problem hiding this comment.
Pull request overview
Migrates notebook-level and cell-level metadata storage from JSON string blobs to native Automerge maps/lists/scalars to enable CRDT-friendly concurrent merging, while keeping backward compatibility via legacy "notebook_metadata" dual-write/fallback behavior.
Changes:
- Added typed
get_metadata_snapshot/set_metadata_snapshotthat read native keys first and dual-write legacy"notebook_metadata"JSON string. - Switched cell
metadatastorage from JSON string to native AutomergeMap, and rewired update paths to mutate Automerge structures directly. - Updated daemon + sync client paths to use typed snapshot reads and typed-diffing for update detection.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| crates/runtimed/src/notebook_sync_server.rs | Uses typed metadata snapshot reads; seeds metadata via set_metadata_snapshot; saves/clones via typed snapshot merge. |
| crates/runtimed/src/notebook_sync_client.rs | Tracks metadata changes via typed NotebookMetadataSnapshot comparison and serializes to legacy JSON string when emitting updates. |
| crates/runtimed-py/src/session_core.rs | Clarifies that Python session API still writes legacy "notebook_metadata" via handle during migration. |
| crates/notebook-doc/src/lib.rs | Implements native JSON↔Automerge conversion helpers; adds typed metadata snapshot read/write; stores cell metadata as native Automerge maps; updates cell metadata mutation APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| current_metadata | ||
| .as_ref() | ||
| .and_then(|m| serde_json::to_string(m).ok()) |
| match value { | ||
| serde_json::Value::Null => { | ||
| // Insert empty string as placeholder for null in lists | ||
| self.doc.insert(list_id, index, "")?; | ||
| } |
| serde_json::Value::Number(n) => { | ||
| if let Some(i) = n.as_i64() { | ||
| self.doc.put(parent, key, i)?; | ||
| } else if let Some(f) = n.as_f64() { | ||
| self.doc.put(parent, key, f)?; | ||
| } | ||
| } |
crates/notebook-doc/src/lib.rs
Outdated
| /// - `Value::Bool` → bool scalar (stored as string "true"/"false" for now) | ||
| /// - `Value::Number` → string representation (Automerge scalars) | ||
| /// - `Value::Null` → deletes the key | ||
| /// |
| // Write as native Automerge Map | ||
| let meta_id = self.doc.put_object(&cell_obj, "metadata", ObjType::Map)?; | ||
| if let serde_json::Value::Object(map) = metadata { | ||
| for (k, v) in map { | ||
| self.put_json_value(&meta_id, k, v)?; | ||
| } | ||
| } |
| match value { | ||
| serde_json::Value::Null => { | ||
| self.doc.delete(parent, key)?; | ||
| } |
| self.doc.put_object(&cell_map, "source", ObjType::Text)?; | ||
| self.doc.put(&cell_map, "execution_count", "null")?; | ||
| self.doc.put_object(&cell_map, "outputs", ObjType::List)?; | ||
| self.doc.put(&cell_map, "metadata", "{}")?; | ||
| self.doc.put_object(&cell_map, "metadata", ObjType::Map)?; | ||
| self.doc | ||
| .put_object(&cell_map, "resolved_assets", ObjType::Map)?; | ||
|
|
| self.set_metadata_value("kernelspec", &value)?; | ||
| } | ||
| if let Some(ref li) = snapshot.language_info { | ||
| let value = serde_json::to_value(li).map_err(|e| { | ||
| AutomergeError::InvalidObjId(format!("serialize language_info: {}", e)) | ||
| })?; | ||
| self.set_metadata_value("language_info", &value)?; |
crates/notebook-doc/src/lib.rs
Outdated
| match self.get_json_value(&cell_obj, "metadata") { | ||
| Some(serde_json::Value::Object(_)) => self.get_json_value(&cell_obj, "metadata"), |
| if let Ok(snapshot) = serde_json::from_str::<NotebookMetadataSnapshot>(metadata_json) { | ||
| match doc.set_metadata_snapshot(&snapshot) { | ||
| Ok(()) => { | ||
| info!( | ||
| "[notebook-sync] Seeded initial metadata from handshake for {}", | ||
| notebook_id | ||
| ); | ||
| } | ||
| Err(e) => { | ||
| warn!("[notebook-sync] Failed to seed initial metadata: {}", e); | ||
| } | ||
| } |
The free-function variant used by the sync client was still reading cell metadata as a JSON string. Now reads native Map first with legacy string fallback, matching NotebookDoc::read_cell behavior. This fixes the cell metadata test failures in CI where set_cell_metadata writes a native Map but the sync client couldn't read it back.
- Null handling: use automerge::ScalarValue::Null instead of delete/empty string - u64 numbers: handle as_u64() explicitly before falling back to f64 - Fix docstring accuracy for put_json_value (bools/numbers are native scalars) - Fix double read in get_cell_metadata (capture on first match) - set_metadata_snapshot: delete stale kernelspec/language_info when None - set_cell_metadata: validate input is a JSON object - Handshake: log warning on metadata parse failure, fall back to legacy key - Sync task: log serialization failures instead of silently dropping metadata - Add 7 round-trip tests for JSON ↔ Automerge conversion (162 tests total)
Migrates notebook metadata and cell metadata from JSON string blobs to native Automerge types. All JSON values become native Automerge objects — Maps, Lists, and scalars — enabling proper CRDT merging.
What changed
notebook-doc (foundation):
put_json_value/get_json_value— generic JSON ↔ Automerge conversionset_metadata_snapshot— dual-writes native keys (kernelspec,language_info,runt) + legacy"notebook_metadata"stringget_metadata_snapshot— reads native keys first, falls back to legacy stringupdate_cell_metadata_atnavigates/creates Automerge Maps directly (no read-modify-write of JSON blob)daemon (notebook_sync_server):
NOTEBOOK_METADATA_KEYreads replaced with typeddoc.get_metadata_snapshot()set_metadata_snapshot(dual-write)sync client:
NotebookMetadataSnapshotcomparison instead of raw string diffWhat this enables
Two clients adding different UV dependencies concurrently: both land (Automerge list merge). Two clients setting
source_hiddenon different cells: both land (separate Map keys). Extension metadata from unknown tools gets the same CRDT treatment automatically.Remaining work
notebook/src/lib.rs(Tauri commands) — still uses legacy key viaGetRawMetadata/SetRawMetadataprotocolapps/notebook/src/lib/notebook-metadata.ts(frontend) — still writes legacy key stringSee #765 for the full plan.
PR submitted by @rgbkrk's agent Quill, via Zed