Skip to content

Add graph-storage crate#4198

Draft
TrueDoctor wants to merge 2 commits into
masterfrom
upstream-graph-storage
Draft

Add graph-storage crate#4198
TrueDoctor wants to merge 2 commits into
masterfrom
upstream-graph-storage

Conversation

@TrueDoctor
Copy link
Copy Markdown
Member

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the graph-storage crate, which provides a delta-based graph representation for the Graphite file format, including CRDT/delta computation, conversion utilities, and round-trip tests. It also cleans up legacy migrations in graph-craft and updates MemoHash to compare values instead of hashes. The review feedback highlights a potential DoS/OOM vulnerability when resizing the exports vector with unvalidated slot indices, suggests optimizing the O(N^2) pairwise comparison in order_consistent to O(N log N) by sorting, and recommends using into_iter().next() instead of drain(..).next() for idiomatic element extraction.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread document/graph-storage/src/lib.rs Outdated
Comment thread document/graph-storage/src/lib.rs Outdated
Comment thread node-graph/graph-craft/src/document/value.rs Outdated
@TrueDoctor
Copy link
Copy Markdown
Member Author

@cubic-ai-dev

@cubic-dev-ai
Copy link
Copy Markdown
Contributor

cubic-dev-ai Bot commented Jun 7, 2026

@cubic-ai-dev

@TrueDoctor I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found across 26 files

Confidence score: 2/5

  • There are multiple high-confidence, user-impacting regressions (sev 7–8) across hashing, graph diffing, and serialization, so merge risk is currently high rather than routine.
  • Most severe: node-graph/libraries/core-types/src/memo.rs makes MemoHash equality ignore hash while Hash still includes it, which can violate hash collection invariants and cause incorrect map/set behavior.
  • Several storage/runtime paths risk dropped or corrupted data: missed network/export diffs in document/graph-storage/src/delta.rs, omitted export-only resources in document/graph-storage/src/from_runtime.rs, and silent truncation in document/graph-storage/src/to_runtime.rs when inputs lengths differ.
  • Pay close attention to node-graph/libraries/core-types/src/memo.rs, document/graph-storage/src/delta.rs, document/graph-storage/src/from_runtime.rs, document/graph-storage/src/to_runtime.rs, and node-graph/libraries/resources/src/lib.rs - they contain the highest-likelihood correctness and round-trip integrity risks.

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread node-graph/libraries/resources/src/lib.rs
Comment thread node-graph/libraries/vector-types/src/gradient.rs Outdated
Comment thread node-graph/libraries/core-types/src/memo.rs
Comment thread document/graph-storage/src/from_runtime.rs
Comment thread document/graph-storage/src/to_runtime.rs
Comment thread document/graph-storage/src/delta.rs
Comment thread document/graph-storage/src/delta.rs Outdated
Comment thread node-graph/graph-craft/src/document/value.rs Outdated
Comment thread document/graph-storage/src/from_runtime.rs Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 issues found across 19 files

Confidence score: 2/5

  • Multiple high-severity, high-confidence data-integrity risks make this PR risky to merge as-is, especially in core graph storage/CRDT paths (document/graph-storage/src/attributes.rs, document/graph-storage/src/to_runtime.rs, document/graph-storage/src/crdt.rs, document/graph-storage/src/ids.rs).
  • The most severe issue is inconsistent Priority equality/ordering/hashing in document/graph-storage/src/attributes.rs, which can break HashMap/BTree* behavior and lead to hard-to-debug corruption-like behavior.
  • There are additional concrete regression risks affecting user-visible state correctness: duplicate node IDs can silently collapse nodes during runtime reconstruction, deletes can be resurrected without tombstones, and non-canonical Rev hashing can make identical deltas produce different IDs.
  • Pay close attention to document/graph-storage/src/attributes.rs, document/graph-storage/src/to_runtime.rs, document/graph-storage/src/crdt.rs, document/graph-storage/src/ids.rs, document/graph-storage/src/session.rs, document/graph-storage/src/delta.rs, document/graph-storage/src/model.rs - core invariants and history/diff behavior can diverge from expected graph state.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="document/graph-storage/src/session.rs">

<violation number="1" location="document/graph-storage/src/session.rs:162">
P2: Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.</violation>
</file>

<file name="document/graph-storage/src/attributes.rs">

<violation number="1" location="document/graph-storage/src/attributes.rs:126">
P1: `Priority` has inconsistent `PartialEq`/`Eq`/`Ord`/`Hash` semantics. This can corrupt behavior in `HashMap`/`BTree*` when equal values hash or order differently.</violation>
</file>

<file name="document/graph-storage/src/to_runtime.rs">

<violation number="1" location="document/graph-storage/src/to_runtime.rs:118">
P1: Duplicate `ORIGINAL_NODE_ID` values in one network are not detected, causing silent node loss when collecting into `FxHashMap`. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.</violation>
</file>

<file name="document/graph-storage/src/delta.rs">

<violation number="1" location="document/graph-storage/src/delta.rs:88">
P2: Export-list truncation is encoded as `SetExport(None)` but that op never shrinks `net.exports`, so diff-apply cannot fully reach target network shape when `to.exports.len() < from.exports.len()`. This leaves persistent trailing slots and causes value-level drift.</violation>
</file>

<file name="document/graph-storage/src/model.rs">

<violation number="1" location="document/graph-storage/src/model.rs:9">
P2: Parallel `inputs`/`inputs_attributes` vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.</violation>

<violation number="2" location="document/graph-storage/src/model.rs:75">
P2: `Reflection` stores required metadata outside the enum, so required state is not atomic. A `ChangeNodeInput` to `Reflection` can leave the node in a conversion-failing state until separate attribute writes happen.</violation>
</file>

<file name="document/graph-storage/src/crdt.rs">

<violation number="1" location="document/graph-storage/src/crdt.rs:186">
P1: Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.</violation>
</file>

<file name="document/graph-storage/src/ids.rs">

<violation number="1" location="document/graph-storage/src/ids.rs:86">
P1: `compute_rev` hashes non-canonical serialization of `RegistryDelta` that includes `HashMap` fields. Identical deltas can receive different `Rev` IDs, breaking stable content-addressed identity assumptions.</violation>
</file>

<file name="document/graph-storage/Cargo.toml">

<violation number="1" location="document/graph-storage/Cargo.toml:15">
P3: Use `{ workspace = true }` instead of an inline version — blake3 is already managed in `[workspace.dependencies]` at line 178 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>

<violation number="2" location="document/graph-storage/Cargo.toml:17">
P3: Use `{ workspace = true }` instead of an inline version — rustc-hash is already managed in `[workspace.dependencies]` at line 102 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>

<violation number="3" location="document/graph-storage/Cargo.toml:18">
P3: Use `{ workspace = true }` instead of an inline version — thiserror is already managed in `[workspace.dependencies]` at line 120 of the root `Cargo.toml`. An inline version bypasses centralised version management.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

/// a value strictly between two neighbors, so concurrent insertions elsewhere never collide; an
/// exact tie between two peers inserting at the same gap is broken by `PeerId` in [`SourceKey`].
/// `f64` precision is ample for the short fallback chains resources carry in practice.
#[derive(Copy, Clone, Debug, PartialEq, Serialize, Deserialize)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Priority has inconsistent PartialEq/Eq/Ord/Hash semantics. This can corrupt behavior in HashMap/BTree* when equal values hash or order differently.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/attributes.rs, line 126:

<comment>`Priority` has inconsistent `PartialEq`/`Eq`/`Ord`/`Hash` semantics. This can corrupt behavior in `HashMap`/`BTree*` when equal values hash or order differently.</comment>

<file context>
@@ -0,0 +1,150 @@
+/// a value strictly between two neighbors, so concurrent insertions elsewhere never collide; an
+/// exact tie between two peers inserting at the same gap is broken by `PeerId` in [`SourceKey`].
+/// `f64` precision is ample for the short fallback chains resources carry in practice.
+#[derive(Copy, Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct Priority(pub f64);
+
</file context>

collector.push(entry);
}

convert_node(registry, declarations, node, metadata_path, runtime_id, node_collector, network_collector).map(|doc_node| (runtime_id, doc_node))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Duplicate ORIGINAL_NODE_ID values in one network are not detected, causing silent node loss when collecting into FxHashMap. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/to_runtime.rs, line 118:

<comment>Duplicate `ORIGINAL_NODE_ID` values in one network are not detected, causing silent node loss when collecting into `FxHashMap`. This can corrupt reconstructed graphs by collapsing distinct storage nodes onto one runtime node id.</comment>

<file context>
@@ -0,0 +1,270 @@
+				collector.push(entry);
+			}
+
+			convert_node(registry, declarations, node, metadata_path, runtime_id, node_collector, network_collector).map(|doc_node| (runtime_id, doc_node))
+		})
+		.collect::<Result<FxHashMap<_, _>, _>>()?;
</file context>

Some(value) => match attributes.entry(key) {
std::collections::hash_map::Entry::Occupied(mut entry) => {
if force || timestamp > entry.get().timestamp {
entry.insert(Value { value, timestamp });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/crdt.rs, line 186:

<comment>Attribute deletes are not tombstoned, so older adds can resurrect deleted keys. This breaks LWW behavior when ops arrive out of order.</comment>

<file context>
@@ -0,0 +1,200 @@
+		Some(value) => match attributes.entry(key) {
+			std::collections::hash_map::Entry::Occupied(mut entry) => {
+				if force || timestamp > entry.get().timestamp {
+					entry.insert(Value { value, timestamp });
+				}
+			}
</file context>

/// Hash the identity-bearing fields of a `Delta` with blake3 and truncate to 128 bits.
pub(crate) fn compute_rev(parents: &[Rev], author: PeerId, timestamp: TimeStamp, delta_type: &RegistryDelta) -> Rev {
let mut hasher = blake3::Hasher::new();
let bytes = rmp_serde::to_vec(&(parents, author, timestamp, delta_type)).expect("Delta identity fields must serialize");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: compute_rev hashes non-canonical serialization of RegistryDelta that includes HashMap fields. Identical deltas can receive different Rev IDs, breaking stable content-addressed identity assumptions.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/ids.rs, line 86:

<comment>`compute_rev` hashes non-canonical serialization of `RegistryDelta` that includes `HashMap` fields. Identical deltas can receive different `Rev` IDs, breaking stable content-addressed identity assumptions.</comment>

<file context>
@@ -0,0 +1,92 @@
+/// Hash the identity-bearing fields of a `Delta` with blake3 and truncate to 128 bits.
+pub(crate) fn compute_rev(parents: &[Rev], author: PeerId, timestamp: TimeStamp, delta_type: &RegistryDelta) -> Rev {
+	let mut hasher = blake3::Hasher::new();
+	let bytes = rmp_serde::to_vec(&(parents, author, timestamp, delta_type)).expect("Delta identity fields must serialize");
+	hasher.update(&bytes);
+	let digest = hasher.finalize();
</file context>


// A new edit abandons any undone-forward branch: those revs stay in the DAG but are no longer
// reachable via redo. (Mirrors the legacy editor clearing its redo history on commit.)
self.document.redo_stack.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/session.rs, line 162:

<comment>Redo history is cleared even when no commit is produced. A no-op retire can silently disable redo after undo.</comment>

<file context>
@@ -0,0 +1,512 @@
+
+		// A new edit abandons any undone-forward branch: those revs stay in the DAG but are no longer
+		// reachable via redo. (Mirrors the legacy editor clearing its redo history on commit.)
+		self.document.redo_stack.clear();
+		for op in ops {
+			let reverse = self.document.compute_reverse_delta(target, &op)?;
</file context>

pub struct Node {
pub(crate) implementation: Implementation,
pub(crate) inputs: Vec<InputSlot>,
pub(crate) inputs_attributes: Vec<Attributes>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Parallel inputs/inputs_attributes vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/model.rs, line 9:

<comment>Parallel `inputs`/`inputs_attributes` vectors make length mismatches representable. That invalid state is already handled as an error during runtime conversion, so enforcing the invariant in the model would prevent late failures.</comment>

<file context>
@@ -0,0 +1,124 @@
+pub struct Node {
+	pub(crate) implementation: Implementation,
+	pub(crate) inputs: Vec<InputSlot>,
+	pub(crate) inputs_attributes: Vec<Attributes>,
+	pub(crate) attributes: Attributes,
+	pub(crate) network: NetworkId,
</file context>

import_idx: usize,
},
/// Marker; the `DocumentNodeMetadata` lives in `inputs_attributes`.
Reflection,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Reflection stores required metadata outside the enum, so required state is not atomic. A ChangeNodeInput to Reflection can leave the node in a conversion-failing state until separate attribute writes happen.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/src/model.rs, line 75:

<comment>`Reflection` stores required metadata outside the enum, so required state is not atomic. A `ChangeNodeInput` to `Reflection` can leave the node in a conversion-failing state until separate attribute writes happen.</comment>

<file context>
@@ -0,0 +1,124 @@
+		import_idx: usize,
+	},
+	/// Marker; the `DocumentNodeMetadata` lives in `inputs_attributes`.
+	Reflection,
+}
+
</file context>

graph-craft = { workspace = true }
graphene-resource = { workspace = true }
core-types = { workspace = true }
blake3 = "1.5"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Use { workspace = true } instead of an inline version — blake3 is already managed in [workspace.dependencies] at line 178 of the root Cargo.toml. An inline version bypasses centralised version management.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 15:

<comment>Use `{ workspace = true }` instead of an inline version — blake3 is already managed in `[workspace.dependencies]` at line 178 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>

<file context>
@@ -0,0 +1,21 @@
+graph-craft = { workspace = true }
+graphene-resource = { workspace = true }
+core-types = { workspace = true }
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
</file context>
Suggested change
blake3 = "1.5"
blake3 = { workspace = true }

core-types = { workspace = true }
blake3 = "1.5"
rmp-serde = "1.3"
rustc-hash = "2.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Use { workspace = true } instead of an inline version — rustc-hash is already managed in [workspace.dependencies] at line 102 of the root Cargo.toml. An inline version bypasses centralised version management.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 17:

<comment>Use `{ workspace = true }` instead of an inline version — rustc-hash is already managed in `[workspace.dependencies]` at line 102 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>

<file context>
@@ -0,0 +1,21 @@
+core-types = { workspace = true }
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
+thiserror = "2.0"
+
</file context>
Suggested change
rustc-hash = "2.0"
rustc-hash = { workspace = true }

blake3 = "1.5"
rmp-serde = "1.3"
rustc-hash = "2.0"
thiserror = "2.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Use { workspace = true } instead of an inline version — thiserror is already managed in [workspace.dependencies] at line 120 of the root Cargo.toml. An inline version bypasses centralised version management.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At document/graph-storage/Cargo.toml, line 18:

<comment>Use `{ workspace = true }` instead of an inline version — thiserror is already managed in `[workspace.dependencies]` at line 120 of the root `Cargo.toml`. An inline version bypasses centralised version management.</comment>

<file context>
@@ -0,0 +1,21 @@
+blake3 = "1.5"
+rmp-serde = "1.3"
+rustc-hash = "2.0"
+thiserror = "2.0"
+
+[dev-dependencies]
</file context>
Suggested change
thiserror = "2.0"
thiserror = { workspace = true }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant