Skip to content

fix: search both edge directions during deduplication#1303

Open
prasmussen15 wants to merge 3 commits intomainfrom
fix/bidirectional-edge-dedup
Open

fix: search both edge directions during deduplication#1303
prasmussen15 wants to merge 3 commits intomainfrom
fix/bidirectional-edge-dedup

Conversation

@prasmussen15
Copy link
Collaborator

Summary

  • Edge deduplication now searches both forward (source→target) and inverse (target→source) directions when finding candidate duplicates via get_between_nodes
  • Previously, edges with the same endpoints but reversed direction were missed entirely, leading to duplicate edges in the graph
  • Bumps graphiti-core version to 0.29.0

Test plan

  • Verify existing edge dedup unit tests pass
  • Test with edges that have reversed source/target but represent the same relationship
  • Confirm no regression in edge resolution performance

🤖 Generated with Claude Code

The edge deduplication flow only searched for existing edges in the
forward direction (source→target). Edges with the same endpoints but
reversed direction were missed, leading to duplicate edges. Now searches
both directions and merges results before sending candidates to the LLM.

Bumps version to 0.29.0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +276 to +287
forward_edges_list: list[list[EntityEdge]] = await semaphore_gather(
*[
EntityEdge.get_between_nodes(driver, edge.source_node_uuid, edge.target_node_uuid)
for edge in extracted_edges
]
)
inverse_edges_list: list[list[EntityEdge]] = await semaphore_gather(
*[
EntityEdge.get_between_nodes(driver, edge.target_node_uuid, edge.source_node_uuid)
for edge in extracted_edges
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two semaphore_gather calls are sequential, but forward_edges_list and inverse_edges_list don't depend on each other. Consider combining them into a single parallel call to reduce latency:

Suggested change
forward_edges_list: list[list[EntityEdge]] = await semaphore_gather(
*[
EntityEdge.get_between_nodes(driver, edge.source_node_uuid, edge.target_node_uuid)
for edge in extracted_edges
]
)
inverse_edges_list: list[list[EntityEdge]] = await semaphore_gather(
*[
EntityEdge.get_between_nodes(driver, edge.target_node_uuid, edge.source_node_uuid)
for edge in extracted_edges
]
)
all_edge_queries = [
EntityEdge.get_between_nodes(driver, edge.source_node_uuid, edge.target_node_uuid)
for edge in extracted_edges
] + [
EntityEdge.get_between_nodes(driver, edge.target_node_uuid, edge.source_node_uuid)
for edge in extracted_edges
]
all_results: list[list[EntityEdge]] = await semaphore_gather(*all_edge_queries)
n = len(extracted_edges)
forward_edges_list = all_results[:n]
inverse_edges_list = all_results[n:]

…el call

Addresses PR review feedback to reduce latency by running both direction
queries in a single semaphore_gather call instead of two sequential ones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +276 to +296
all_edge_queries = [
EntityEdge.get_between_nodes(driver, edge.source_node_uuid, edge.target_node_uuid)
for edge in extracted_edges
] + [
EntityEdge.get_between_nodes(driver, edge.target_node_uuid, edge.source_node_uuid)
for edge in extracted_edges
]
all_results: list[list[EntityEdge]] = await semaphore_gather(*all_edge_queries)
n = len(extracted_edges)
forward_edges_list = all_results[:n]
inverse_edges_list = all_results[n:]

valid_edges_list: list[list[EntityEdge]] = []
for forward_edges, inverse_edges in zip(forward_edges_list, inverse_edges_list, strict=True):
seen_uuids: set[str] = set()
combined: list[EntityEdge] = []
for edge in [*forward_edges, *inverse_edges]:
if edge.uuid not in seen_uuids:
seen_uuids.add(edge.uuid)
combined.append(edge)
valid_edges_list.append(combined)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance concern: This change doubles the number of database queries from n to 2n for edge deduplication. Consider modifying get_between_nodes to use a bidirectional match in the Cypher query instead:

MATCH (n:Entity {uuid: $uuid1})-[e:RELATES_TO]-(m:Entity {uuid: $uuid2})

Using - instead of -> would match edges in both directions with a single query, avoiding the performance regression.

If modifying get_between_nodes is not desired (to preserve its current directional semantics), consider adding a new get_between_nodes_bidirectional method instead.


valid_edges_list: list[list[EntityEdge]] = []
for forward_edges, inverse_edges in zip(forward_edges_list, inverse_edges_list, strict=True):
seen_uuids: set[str] = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable seen_uuids shadows the seen dict used earlier in the function (line 256). While not a bug due to scope, consider using a more descriptive name like combined_edge_uuids to avoid confusion during code review.

@claude
Copy link
Contributor

claude bot commented Mar 6, 2026

Review Feedback

Missing test coverage: This PR adds important bidirectional edge lookup behavior but doesn't include unit tests to validate it. Consider adding tests that verify:

  1. Edges with reversed source/target are correctly found as duplicate candidates
  2. The deduplication logic correctly combines forward and inverse results without duplicates
  3. No regression in behavior when edges only exist in one direction

This would help ensure the fix works as intended and prevent future regressions.

Adds a new EntityEdge.get_between_nodes_bidirectional method that uses
undirected Cypher matching (-) instead of directed (->), finding edges
in both directions with a single query. This preserves the existing
directional semantics of get_between_nodes for other callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +438 to +441
@classmethod
async def get_between_nodes_bidirectional(
cls, driver: GraphDriver, node_uuid_a: str, node_uuid_b: str
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing graph_operations_interface check: Unlike the existing get_between_nodes method (lines 405-411), this new method doesn't check for a custom graph_operations_interface implementation before executing the default query. This inconsistency means:

  1. Users with custom graph_operations_interface implementations will have their custom logic bypassed during edge deduplication
  2. There's no edge_get_between_nodes_bidirectional method defined in GraphOperationsInterface to call even if a check were added

Consider either:

  • Adding the interface check with a new edge_get_between_nodes_bidirectional method to GraphOperationsInterface, or
  • Documenting that this method intentionally bypasses the interface (if that's the intended behavior)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant