Skip to content

feat(notebook-doc): native Automerge metadata (no more JSON string blobs)#785

Closed
rgbkrk wants to merge 10 commits intomainfrom
cleanup/metadata-api
Closed

feat(notebook-doc): native Automerge metadata (no more JSON string blobs)#785
rgbkrk wants to merge 10 commits intomainfrom
cleanup/metadata-api

Conversation

@rgbkrk
Copy link
Member

@rgbkrk rgbkrk commented Mar 13, 2026

Migrates notebook metadata and cell metadata from JSON string blobs to native Automerge types. All JSON values become native Automerge objects — Maps, Lists, and scalars — enabling proper CRDT merging.

What changed

notebook-doc (foundation):

  • put_json_value / get_json_value — generic JSON ↔ Automerge conversion
  • set_metadata_snapshot — dual-writes native keys (kernelspec, language_info, runt) + legacy "notebook_metadata" string
  • get_metadata_snapshot — reads native keys first, falls back to legacy string
  • Cell metadata stored as native Automerge Maps (was JSON string)
  • update_cell_metadata_at navigates/creates Automerge Maps directly (no read-modify-write of JSON blob)
  • 155 notebook-doc tests pass

daemon (notebook_sync_server):

  • All NOTEBOOK_METADATA_KEY reads replaced with typed doc.get_metadata_snapshot()
  • Save/load uses typed snapshot directly
  • Handshake seeding writes via set_metadata_snapshot (dual-write)

sync client:

  • Metadata change tracking uses typed NotebookMetadataSnapshot comparison instead of raw string diff

What this enables

Two clients adding different UV dependencies concurrently: both land (Automerge list merge). Two clients setting source_hidden on different cells: both land (separate Map keys). Extension metadata from unknown tools gets the same CRDT treatment automatically.

Remaining work

  • notebook/src/lib.rs (Tauri commands) — still uses legacy key via GetRawMetadata/SetRawMetadata protocol
  • apps/notebook/src/lib/notebook-metadata.ts (frontend) — still writes legacy key string
  • These can be migrated in follow-up PRs since the dual-write ensures backward compat

See #765 for the full plan.

PR submitted by @rgbkrk's agent Quill, via Zed

Adds put_json_value/get_json_value for converting between
serde_json::Value and native Automerge types:

- JSON objects → Automerge Maps
- JSON arrays → Automerge Lists
- Scalars → native Automerge scalars (string, bool, i64, f64)
- null → delete key (maps) or empty string placeholder (lists)

Also adds set_metadata_value/get_metadata_value convenience methods
for notebook-level metadata using native types.

Foundation for #765 — migrating metadata from JSON string blobs
to native Automerge types for proper CRDT merging.
set_metadata_snapshot now writes kernelspec, language_info, and runt
as separate native Automerge maps/lists instead of one JSON blob.
get_metadata_snapshot reads native keys first, falls back to legacy
"notebook_metadata" string for backward compat.

Dual-write: both native keys AND legacy string are written, so old
daemons can still read new docs. The legacy key will be removed once
all consumers are updated.

This enables per-field CRDT merging — two clients adding different
UV dependencies concurrently will both land (Automerge list merge)
instead of one clobbering the other (LWW on JSON string).
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces foundational primitives in notebook-doc to convert between serde_json::Value and native Automerge types (Maps/Lists/scalars), enabling CRDT-friendly notebook metadata storage as a prerequisite to migrating away from JSON-string “blob” metadata.

Changes:

  • Add recursive put_json_value / get_json_value helpers to write/read JSON structures as native Automerge objects.
  • Add set_metadata_value / get_metadata_value to store top-level notebook metadata keys as native Automerge subtrees.
  • Add scalar_to_json helper for reconstructing JSON values from Automerge scalars.
Comments suppressed due to low confidence (2)

crates/notebook-doc/src/lib.rs:1347

  • The docstring is out of sync with the implementation: booleans are being stored as native Automerge booleans (not string "true"/"false"), and numbers are stored as i64/f64 (not string representations). Please update the documentation to match the actual on-disk representation to avoid confusing callers and future migrations.
            if !resolved_assets.contains_key(key) {
                self.doc.delete(&resolved_assets_id, key)?;
            }
        }

crates/notebook-doc/src/lib.rs:1369

  • Value::Number handling currently only writes i64 or f64. This can lose precision for large unsigned integers (e.g. u64 > i64::MAX becomes f64) or even silently write nothing if neither branch matches, leaving the previous value intact. Consider handling as_u64() explicitly (and/or returning an error if the number cannot be represented) so numeric metadata round-trips correctly.
        read_str(&self.doc, meta_id, key)
    }

    /// Set a metadata value.
    pub fn set_metadata(&mut self, key: &str, value: &str) -> Result<(), AutomergeError> {
        let meta_id = match self.metadata_map_id() {
            Some(id) => id,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +1397 to +1400
///
/// If the key already exists, it is replaced (the old object is overwritten).
pub fn put_json_value(
&mut self,
Comment on lines +1404 to +1410
) -> Result<(), AutomergeError> {
match value {
serde_json::Value::Null => {
self.doc.delete(parent, key)?;
}
serde_json::Value::Bool(b) => {
self.doc.put(parent, key, *b)?;
Comment on lines 1340 to +1502
.put_object(&cell_obj, "resolved_assets", ObjType::Map)?,
};

for key in existing.keys() {
if !resolved_assets.contains_key(key) {
self.doc.delete(&resolved_assets_id, key)?;
}
}

for (asset_ref, blob_hash) in resolved_assets {
if existing.get(asset_ref) != Some(blob_hash) {
self.doc.put(&resolved_assets_id, asset_ref, blob_hash)?;
}
}

Ok(true)
}

// ── Notebook metadata ──────────────────────────────────────────

/// Read a metadata value.
pub fn get_metadata(&self, key: &str) -> Option<String> {
let meta_id = self.metadata_map_id()?;
read_str(&self.doc, meta_id, key)
}

/// Set a metadata value.
pub fn set_metadata(&mut self, key: &str, value: &str) -> Result<(), AutomergeError> {
let meta_id = match self.metadata_map_id() {
Some(id) => id,
None => {
// Create metadata map if missing
let id = self
.doc
.put_object(automerge::ROOT, "metadata", ObjType::Map)?;
self.doc.put(&id, key, value)?;
return Ok(());
}
};
self.doc.put(&meta_id, key, value)?;
Ok(())
}

// ── Native JSON ↔ Automerge ─────────────────────────────────────
//
// These primitives convert between serde_json::Value and native
// Automerge objects. JSON objects become Maps, arrays become Lists,
// and scalars become scalars. No JSON-encoded string blobs.

/// Write a `serde_json::Value` into the Automerge doc as native types.
///
/// - `Value::Object` → `ObjType::Map` with recursive children
/// - `Value::Array` → `ObjType::List` with recursive elements
/// - `Value::String` → string scalar
/// - `Value::Bool` → bool scalar (stored as string "true"/"false" for now)
/// - `Value::Number` → string representation (Automerge scalars)
/// - `Value::Null` → deletes the key
///
/// If the key already exists, it is replaced (the old object is overwritten).
pub fn put_json_value(
&mut self,
parent: &ObjId,
key: &str,
value: &serde_json::Value,
) -> Result<(), AutomergeError> {
match value {
serde_json::Value::Null => {
self.doc.delete(parent, key)?;
}
serde_json::Value::Bool(b) => {
self.doc.put(parent, key, *b)?;
}
serde_json::Value::Number(n) => {
if let Some(i) = n.as_i64() {
self.doc.put(parent, key, i)?;
} else if let Some(f) = n.as_f64() {
self.doc.put(parent, key, f)?;
}
}
serde_json::Value::String(s) => {
self.doc.put(parent, key, s.as_str())?;
}
serde_json::Value::Array(arr) => {
let list_id = self.doc.put_object(parent, key, ObjType::List)?;
for (i, elem) in arr.iter().enumerate() {
self.insert_json_value_at(&list_id, i, elem)?;
}
}
serde_json::Value::Object(map) => {
let map_id = self.doc.put_object(parent, key, ObjType::Map)?;
for (k, v) in map {
self.put_json_value(&map_id, k, v)?;
}
}
}
Ok(())
}

/// Insert a `serde_json::Value` into an Automerge list at the given index.
fn insert_json_value_at(
&mut self,
list_id: &ObjId,
index: usize,
value: &serde_json::Value,
) -> Result<(), AutomergeError> {
match value {
serde_json::Value::Null => {
// Insert empty string as placeholder for null in lists
self.doc.insert(list_id, index, "")?;
}
serde_json::Value::Bool(b) => {
self.doc.insert(list_id, index, *b)?;
}
serde_json::Value::Number(n) => {
if let Some(i) = n.as_i64() {
self.doc.insert(list_id, index, i)?;
} else if let Some(f) = n.as_f64() {
self.doc.insert(list_id, index, f)?;
}
}
serde_json::Value::String(s) => {
self.doc.insert(list_id, index, s.as_str())?;
}
serde_json::Value::Array(arr) => {
let nested_list = self.doc.insert_object(list_id, index, ObjType::List)?;
for (i, elem) in arr.iter().enumerate() {
self.insert_json_value_at(&nested_list, i, elem)?;
}
}
serde_json::Value::Object(map) => {
let nested_map = self.doc.insert_object(list_id, index, ObjType::Map)?;
for (k, v) in map {
self.put_json_value(&nested_map, k, v)?;
}
}
}
Ok(())
}

/// Read an Automerge subtree as a `serde_json::Value`.
///
/// Maps become `Value::Object`, Lists become `Value::Array`,
/// and scalars become the appropriate JSON type.
///
/// Returns `None` if the key doesn't exist.
pub fn get_json_value(&self, parent: &ObjId, key: &str) -> Option<serde_json::Value> {
use automerge::Value;
match self.doc.get(parent, key).ok()?? {
(Value::Object(ObjType::Map), map_id) => Some(self.read_map_as_json(&map_id)),
(Value::Object(ObjType::List), list_id) => Some(self.read_list_as_json(&list_id)),
(Value::Scalar(s), _) => Some(scalar_to_json(&s)),
_ => None,
}
}

/// Read an Automerge map as a JSON object.
fn read_map_as_json(&self, map_id: &ObjId) -> serde_json::Value {
let mut obj = serde_json::Map::new();
for key in self.doc.keys(map_id) {
if let Some(value) = self.get_json_value(map_id, &key) {
obj.insert(key, value);
}
}
rgbkrk added 5 commits March 13, 2026 16:04
Cell metadata is now stored as native Automerge Maps instead of
JSON-encoded strings. This enables per-field CRDT merging:

- Two clients setting source_hidden and outputs_hidden concurrently
  both land (separate Map keys, no conflict)
- Tags stored as native Automerge Lists (concurrent tag adds merge)
- Extension metadata gets the same native treatment automatically

All read paths handle both native Maps (new) and legacy JSON strings
(old) for backward compatibility.

Also migrates update_cell_metadata_at to navigate and create
Automerge Maps directly instead of read-modify-write on a JSON blob.

155 notebook-doc tests pass.
…accessors

Replace all raw NOTEBOOK_METADATA_KEY reads with typed accessors:
- check_and_broadcast_sync_state: doc.get_metadata_snapshot()
- resolve_metadata_snapshot: doc.get_metadata_snapshot()
- detect_room_runtime: doc.get_metadata_snapshot()
- save_notebook_to_disk: doc.get_metadata_snapshot()
- clone save: doc.get_metadata_snapshot()
- handshake seeding: doc.set_metadata_snapshot()

NOTEBOOK_METADATA_KEY import removed from this file.
…shots

The sync task now compares NotebookMetadataSnapshot structs instead of
raw JSON strings for change detection. This correctly detects changes
to native Automerge keys (runt, kernelspec, language_info) that wouldn't
be visible in the legacy 'notebook_metadata' string.

Remaining NOTEBOOK_METADATA_KEY usage in sync client:
- get_metadata fast-path (still valid for backward compat)
- initial_metadata in into_split_with_pipe (handshake compat)
- session_core: document that set_notebook_metadata uses legacy key
  (dual-write happens at doc layer via set_metadata_snapshot)
- sync client: metadata change tracking uses typed snapshots
- daemon: all NOTEBOOK_METADATA_KEY reads replaced with typed accessors
- notebook-doc: cell metadata stored as native Automerge Maps
- 155 notebook-doc tests pass
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates notebook-level and cell-level metadata storage from JSON string blobs to native Automerge maps/lists/scalars to enable CRDT-friendly concurrent merging, while keeping backward compatibility via legacy "notebook_metadata" dual-write/fallback behavior.

Changes:

  • Added typed get_metadata_snapshot / set_metadata_snapshot that read native keys first and dual-write legacy "notebook_metadata" JSON string.
  • Switched cell metadata storage from JSON string to native Automerge Map, and rewired update paths to mutate Automerge structures directly.
  • Updated daemon + sync client paths to use typed snapshot reads and typed-diffing for update detection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
crates/runtimed/src/notebook_sync_server.rs Uses typed metadata snapshot reads; seeds metadata via set_metadata_snapshot; saves/clones via typed snapshot merge.
crates/runtimed/src/notebook_sync_client.rs Tracks metadata changes via typed NotebookMetadataSnapshot comparison and serializes to legacy JSON string when emitting updates.
crates/runtimed-py/src/session_core.rs Clarifies that Python session API still writes legacy "notebook_metadata" via handle during migration.
crates/notebook-doc/src/lib.rs Implements native JSON↔Automerge conversion helpers; adds typed metadata snapshot read/write; stores cell metadata as native Automerge maps; updates cell metadata mutation APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +2927 to +2929
current_metadata
.as_ref()
.and_then(|m| serde_json::to_string(m).ok())
Comment on lines +1463 to +1467
match value {
serde_json::Value::Null => {
// Insert empty string as placeholder for null in lists
self.doc.insert(list_id, index, "")?;
}
Comment on lines +1430 to +1436
serde_json::Value::Number(n) => {
if let Some(i) = n.as_i64() {
self.doc.put(parent, key, i)?;
} else if let Some(f) = n.as_f64() {
self.doc.put(parent, key, f)?;
}
}
Comment on lines +1412 to +1415
/// - `Value::Bool` → bool scalar (stored as string "true"/"false" for now)
/// - `Value::Number` → string representation (Automerge scalars)
/// - `Value::Null` → deletes the key
///
Comment on lines +1215 to +1221
// Write as native Automerge Map
let meta_id = self.doc.put_object(&cell_obj, "metadata", ObjType::Map)?;
if let serde_json::Value::Object(map) = metadata {
for (k, v) in map {
self.put_json_value(&meta_id, k, v)?;
}
}
Comment on lines +1423 to +1426
match value {
serde_json::Value::Null => {
self.doc.delete(parent, key)?;
}
Comment on lines 689 to 695
self.doc.put_object(&cell_map, "source", ObjType::Text)?;
self.doc.put(&cell_map, "execution_count", "null")?;
self.doc.put_object(&cell_map, "outputs", ObjType::List)?;
self.doc.put(&cell_map, "metadata", "{}")?;
self.doc.put_object(&cell_map, "metadata", ObjType::Map)?;
self.doc
.put_object(&cell_map, "resolved_assets", ObjType::Map)?;

Comment on lines +219 to +225
self.set_metadata_value("kernelspec", &value)?;
}
if let Some(ref li) = snapshot.language_info {
let value = serde_json::to_value(li).map_err(|e| {
AutomergeError::InvalidObjId(format!("serialize language_info: {}", e))
})?;
self.set_metadata_value("language_info", &value)?;
Comment on lines +1184 to +1185
match self.get_json_value(&cell_obj, "metadata") {
Some(serde_json::Value::Object(_)) => self.get_json_value(&cell_obj, "metadata"),
Comment on lines 771 to 782
if let Ok(snapshot) = serde_json::from_str::<NotebookMetadataSnapshot>(metadata_json) {
match doc.set_metadata_snapshot(&snapshot) {
Ok(()) => {
info!(
"[notebook-sync] Seeded initial metadata from handshake for {}",
notebook_id
);
}
Err(e) => {
warn!("[notebook-sync] Failed to seed initial metadata: {}", e);
}
}
rgbkrk added 3 commits March 13, 2026 16:32
The free-function variant used by the sync client was still reading
cell metadata as a JSON string. Now reads native Map first with
legacy string fallback, matching NotebookDoc::read_cell behavior.

This fixes the cell metadata test failures in CI where set_cell_metadata
writes a native Map but the sync client couldn't read it back.
- Null handling: use automerge::ScalarValue::Null instead of delete/empty string
- u64 numbers: handle as_u64() explicitly before falling back to f64
- Fix docstring accuracy for put_json_value (bools/numbers are native scalars)
- Fix double read in get_cell_metadata (capture on first match)
- set_metadata_snapshot: delete stale kernelspec/language_info when None
- set_cell_metadata: validate input is a JSON object
- Handshake: log warning on metadata parse failure, fall back to legacy key
- Sync task: log serialization failures instead of silently dropping metadata
- Add 7 round-trip tests for JSON ↔ Automerge conversion (162 tests total)
@rgbkrk rgbkrk closed this Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants