Skip to content

feat: add neighborhood-based graph traversal for retrievers#2328

Open
Vasilije1990 wants to merge 1 commit intodevfrom
feature/neighborhood-graph-traversal
Open

feat: add neighborhood-based graph traversal for retrievers#2328
Vasilije1990 wants to merge 1 commit intodevfrom
feature/neighborhood-graph-traversal

Conversation

@Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Mar 8, 2026

Summary

  • Add get_neighborhood(node_ids, depth, edge_types) to GraphDBInterface and implement in Kuzu, Neo4j, and Neptune adapters using variable-length Cypher path patterns ([*1..N])
  • Add project_neighborhood_from_db() to CogneeGraph with extracted _process_nodes_and_edges() helper to eliminate duplication
  • Add neighborhood_depth hyperparameter to brute_force_triplet_search, GraphCompletionRetriever, and GraphCompletionContextExtensionRetriever
  • Wire neighborhood_depth end-to-end through the search API (search()authorized_search()search_in_datasets_context() → retriever factory)

When neighborhood_depth is set, the retriever extracts a k-hop subgraph around the top vector-search seed nodes instead of projecting the full graph. This gives more focused, structurally relevant context for graph-based completions.

Usage:

await cognee.search(
    "What is X?",
    query_type=SearchType.GRAPH_COMPLETION,
    neighborhood_depth=2,  # 2-hop neighborhood around seed nodes
)

Test plan

  • Verify existing search behavior is unchanged when neighborhood_depth is not set (default None)
  • Test with neighborhood_depth=1 and neighborhood_depth=2 on a populated knowledge graph
  • Verify Kuzu adapter get_neighborhood() returns correct nodes/edges format
  • Verify Neo4j adapter get_neighborhood() returns correct nodes/edges format

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Neighborhood-based search: retrieve subgraphs around specified nodes with configurable depth control
    • Extended search API with neighborhood_depth parameter to enhance contextual retrieval capabilities
    • Optional edge-type filtering for precise neighborhood queries across all supported databases
  • Refactor

    • Consolidated internal graph processing logic for improved code maintainability

Add configurable k-hop neighborhood extraction to graph retrievers.
When neighborhood_depth is set, the retriever extracts a subgraph
around vector-search seed nodes instead of projecting the full graph.

Changes:
- Add get_neighborhood() abstract method to GraphDBInterface
- Implement get_neighborhood() in Kuzu, Neo4j, and Neptune adapters
- Add project_neighborhood_from_db() to CogneeGraph with shared
  _process_nodes_and_edges() helper to avoid code duplication
- Wire neighborhood_depth parameter through brute_force_triplet_search,
  GraphCompletionRetriever, GraphCompletionContextExtensionRetriever,
  search factory, and search API layers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: vasilije <vas.markovic@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

Walkthrough

This PR adds neighborhood-based graph querying across the stack. It introduces get_neighborhood to the graph database interface with implementations for Kuzu, Neo4j, and Neptune adapters to retrieve k-hop subgraphs. The search API is extended with an optional neighborhood_depth parameter that threads through the retrieval pipeline to enable localized graph context queries. Core graph projection logic in CogneeGraph is refactored for reusability.

Changes

Cohort / File(s) Summary
API Layer
cognee/api/v1/search/search.py
Added optional neighborhood_depth parameter to search endpoint signature; parameter propagated through internal search_function call.
Database Interface & Adapters
cognee/infrastructure/databases/graph/graph_db_interface.py, cognee/infrastructure/databases/graph/kuzu/adapter.py, cognee/infrastructure/databases/graph/neo4j_driver/adapter.py, cognee/infrastructure/databases/graph/neptune_driver/adapter.py
Added abstract get_neighborhood method to GraphDBInterface; implemented k-hop neighborhood retrieval in three adapters using database-specific queries, including node/edge property expansion, optional edge-type filtering, and consistent return format.
Graph Projection Refactor
cognee/modules/graph/cognee_graph/CogneeGraph.py
Extracted node/edge processing logic into private _process_nodes_and_edges helper method; added new project_neighborhood_from_db public method for subgraph projection; updated project_graph_from_db signature with node/edge projection parameters and error handling.
Retrieval Layer
cognee/modules/retrieval/graph_completion_retriever.py, cognee/modules/retrieval/graph_completion_context_extension_retriever.py, cognee/modules/retrieval/utils/brute_force_triplet_search.py
Added neighborhood_depth and neighborhood_seed_top_k parameters to retriever initialization; updated brute_force_triplet_search and helper methods to conditionally use neighborhood projection when depth is set; parameters threaded through triplet search pipeline.
Search Flow
cognee/modules/search/methods/search.py, cognee/modules/search/methods/get_search_type_retriever_instance.py
Extended search signatures to accept and propagate neighborhood_depth through authorized_search, dataset context search, and retriever instantiation; neighborhood_depth extracted from kwargs and passed to graph retriever initialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

  • PR #2217: Modifies Kuzu adapter neighbor-retrieval logic; overlaps with get_neighborhood implementation in this PR.
  • PR #1926: Updates CogneeGraph projection and triplet_distance_penalty handling; shares refactoring of projection methods with this PR.
  • PR #1991: Modifies brute_force_triplet_search and helper call paths; overlaps with neighborhood parameter threading in this PR.

Suggested labels

run-checks, core-team

Suggested reviewers

  • lxobr
  • hajdul88
  • dexters1
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is largely complete with a clear summary, usage example, and test plan. However, the provided description deviates from the template structure; it lacks explicit sections for Acceptance Criteria, Type of Change checkbox selection, Screenshots, Pre-submission Checklist completion, and DCO Affirmation. Fill in all template sections including: Type of Change (mark 'New feature'), Acceptance Criteria, Screenshots of tests passing, Pre-submission Checklist items, and DCO Affirmation confirmation.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature addition: neighborhood-based graph traversal for retrievers, which aligns with the core changes across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 90.32% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/neighborhood-graph-traversal

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)

257-259: Preserve the traceback in these projection logs.

Both except Exception blocks currently reduce the failure to str(e), which drops the stack trace right where adapter/query diagnostics matter most.

🪵 Suggested fix
-        except Exception as e:
-            logger.error(f"Error during graph projection: {str(e)}")
+        except Exception:
+            logger.error("Error during graph projection", exc_info=True)
             raise
-        except Exception as e:
-            logger.error(f"Error during neighborhood projection: {str(e)}")
+        except Exception:
+            logger.error("Error during neighborhood projection", exc_info=True)
             raise
As per coding guidelines, "Prefer explicit, structured error handling in Python code".

Also applies to: 304-306

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/modules/graph/cognee_graph/CogneeGraph.py` around lines 257 - 259, The
except blocks in CogneeGraph.py are logging only str(e), which omits the
traceback; replace those logger.error(...) calls inside the graph projection
error handlers with logger.exception("Error during graph projection") or
logger.error("Error during graph projection", exc_info=True) so the stack trace
is preserved in logs, and apply the same change to the other similar except
block (around lines 304-306) that currently logs str(e); keep the existing bare
"raise" to re-raise the original exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cognee/api/v1/search/search.py`:
- Around line 44-45: The new public parameter neighborhood_depth is forwarded
unchanged to the adapter get_neighborhood(), allowing 0, negative or non-int
values to create invalid path patterns; validate neighborhood_depth early in the
containing function (the public search handler that returns List[SearchResult])
by checking it is an integer > 0 (and within any configured max if applicable),
and if not raise/return a clear API error (e.g., BadRequest/ValueError) before
calling get_neighborhood(); update callers that pass neighborhood_depth through
(the code around where neighborhood_depth is forwarded) to rely on this
validated value.

In `@cognee/infrastructure/databases/graph/neptune_driver/adapter.py`:
- Around line 690-725: get_neighborhood() mixes external node IDs (~id) and
internal Neptune ids (id(n)), causing mismatches; ensure the same ID domain is
used throughout by returning and filtering on the external id property (`~id`).
Update the path_query to RETURN neighbor.`~id` (collect into neighbor_ids),
build all_ids as union of node_ids and those neighbor `~id`s, change nodes_query
to WHERE n.`~id` IN $ids and RETURN n.`~id` AS node_id, and change edges_query
to WHERE source.`~id` IN $ids AND target.`~id` IN $ids and RETURN source.`~id`
AS source_id, target.`~id` AS target_id (keep function name get_neighborhood and
variables path_query, nodes_query, edges_query, all_ids, neighbor_ids, node_ids
to locate changes).

In `@cognee/modules/graph/cognee_graph/CogneeGraph.py`:
- Around line 280-291: project_neighborhood_from_db currently forwards invalid
inputs (depth <= 0 or empty seed_node_ids) to the adapter and treats any empty
edges_data as an error even when nodes_data contains only the requested seeds;
validate inputs early and relax the empty-edge check: in
project_neighborhood_from_db, before calling adapter.get_neighborhood validate
and raise a clear input error if depth < 1 or seed_node_ids is empty (use
InvalidDimensionsError or a new InvalidInputError), then call
adapter.get_neighborhood; after the call, only raise EntityNotFoundError if
nodes_data is empty (no nodes returned); allow edges_data to be empty when
nodes_data contains the requested seed_node_ids (i.e., accept seed-only
neighborhoods) and only treat missing edges as an error when your logic expects
at least one edge type to be present.

In `@cognee/modules/retrieval/utils/brute_force_triplet_search.py`:
- Around line 55-56: The neighborhood_depth flag is being ignored when
relevant_ids_to_filter is falsy because the code calls project_graph_from_db()
inside the neighborhood branch; change the logic in brute_force_triplet_search
(around the neighborhood_depth check) so that if neighborhood_depth is set and
relevant_ids_to_filter is empty you either (A) fail fast by raising a ValueError
indicating seed IDs are required for neighborhood mode, or (B) compute/derive
seed IDs before entering neighborhood mode (e.g., call the existing
seed-derivation helper or add a new get_seed_ids function) and then proceed to
call project_graph_from_db() only with those seed IDs; ensure references to
neighborhood_seed_top_k and relevant_ids_to_filter are used to derive seeds if
you choose option B and do not fall back to full-graph projection silently.

---

Nitpick comments:
In `@cognee/modules/graph/cognee_graph/CogneeGraph.py`:
- Around line 257-259: The except blocks in CogneeGraph.py are logging only
str(e), which omits the traceback; replace those logger.error(...) calls inside
the graph projection error handlers with logger.exception("Error during graph
projection") or logger.error("Error during graph projection", exc_info=True) so
the stack trace is preserved in logs, and apply the same change to the other
similar except block (around lines 304-306) that currently logs str(e); keep the
existing bare "raise" to re-raise the original exception.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b77f4569-41b9-4b38-a164-9607d8b0a297

📥 Commits

Reviewing files that changed from the base of the PR and between f7ba5db and 6cfe0e8.

📒 Files selected for processing (11)
  • cognee/api/v1/search/search.py
  • cognee/infrastructure/databases/graph/graph_db_interface.py
  • cognee/infrastructure/databases/graph/kuzu/adapter.py
  • cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
  • cognee/infrastructure/databases/graph/neptune_driver/adapter.py
  • cognee/modules/graph/cognee_graph/CogneeGraph.py
  • cognee/modules/retrieval/graph_completion_context_extension_retriever.py
  • cognee/modules/retrieval/graph_completion_retriever.py
  • cognee/modules/retrieval/utils/brute_force_triplet_search.py
  • cognee/modules/search/methods/get_search_type_retriever_instance.py
  • cognee/modules/search/methods/search.py

Comment on lines +44 to 45
neighborhood_depth: Optional[int] = None,
) -> List[SearchResult]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate neighborhood_depth before forwarding it.

Line 233 passes the new public parameter through unchanged. 0, negative values, or non-ints will currently reach the adapter get_neighborhood() queries and build invalid [*1..N] path patterns instead of returning a clear API error.

🛡️ Suggested guard
 async def search(
     query_text: str,
@@
     retriever_specific_config: Optional[dict] = None,
     neighborhood_depth: Optional[int] = None,
 ) -> List[SearchResult]:
+    if neighborhood_depth is not None and (
+        not isinstance(neighborhood_depth, int) or neighborhood_depth < 1
+    ):
+        raise CogneeValidationError(
+            message="neighborhood_depth must be a positive integer.",
+            name="InvalidNeighborhoodDepth",
+        )
+
     """

Also applies to: 217-233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/api/v1/search/search.py` around lines 44 - 45, The new public
parameter neighborhood_depth is forwarded unchanged to the adapter
get_neighborhood(), allowing 0, negative or non-int values to create invalid
path patterns; validate neighborhood_depth early in the containing function (the
public search handler that returns List[SearchResult]) by checking it is an
integer > 0 (and within any configured max if applicable), and if not
raise/return a clear API error (e.g., BadRequest/ValueError) before calling
get_neighborhood(); update callers that pass neighborhood_depth through (the
code around where neighborhood_depth is forwarded) to rely on this validated
value.

Comment on lines +690 to +725
if edge_types:
allowed = "|".join(edge_types)
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[:{allowed}*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT id(neighbor) AS nid
"""
else:
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT id(neighbor) AS nid
"""

result = await self.query(path_query, {"node_ids": node_ids})
neighbor_ids = [record["nid"] for record in result if record.get("nid")]

all_ids = list(set(node_ids) | set(neighbor_ids))

# Step 2: Fetch all nodes
nodes_query = f"""
MATCH (n:{self._GRAPH_NODE_LABEL})
WHERE id(n) IN $ids
RETURN id(n) AS node_id, properties(n) AS properties
"""
nodes_result = await self.query(nodes_query, {"ids": all_ids})
nodes = [(r["node_id"], r["properties"]) for r in nodes_result]

# Step 3: Fetch all edges between collected nodes
edges_query = f"""
MATCH (source:{self._GRAPH_NODE_LABEL})-[r]->(target:{self._GRAPH_NODE_LABEL})
WHERE id(source) IN $ids AND id(target) IN $ids
RETURN id(source) AS source_id, id(target) AS target_id,
type(r) AS relationship_name, properties(r) AS properties
"""
edges_result = await self.query(edges_query, {"ids": all_ids})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep get_neighborhood() on a single ID domain.

Line 694 matches seed nodes by ~id, but Lines 712 and 721 switch to id(n) / id(source). That makes all_ids a mix of external IDs and Neptune internal IDs, so seed nodes and their incident edges can disappear from the returned neighborhood.

🔧 One consistent way to fix it
             if edge_types:
                 allowed = "|".join(edge_types)
                 path_query = f"""
                 MATCH (seed:{self._GRAPH_NODE_LABEL})-[:{allowed}*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
                 WHERE seed.`~id` IN $node_ids
-                RETURN DISTINCT id(neighbor) AS nid
+                RETURN DISTINCT neighbor.`~id` AS nid
                 """
             else:
                 path_query = f"""
                 MATCH (seed:{self._GRAPH_NODE_LABEL})-[*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
                 WHERE seed.`~id` IN $node_ids
-                RETURN DISTINCT id(neighbor) AS nid
+                RETURN DISTINCT neighbor.`~id` AS nid
                 """
@@
             nodes_query = f"""
             MATCH (n:{self._GRAPH_NODE_LABEL})
-            WHERE id(n) IN $ids
-            RETURN id(n) AS node_id, properties(n) AS properties
+            WHERE n.`~id` IN $ids
+            RETURN n.`~id` AS node_id, properties(n) AS properties
             """
@@
             edges_query = f"""
             MATCH (source:{self._GRAPH_NODE_LABEL})-[r]->(target:{self._GRAPH_NODE_LABEL})
-            WHERE id(source) IN $ids AND id(target) IN $ids
-            RETURN id(source) AS source_id, id(target) AS target_id,
+            WHERE source.`~id` IN $ids AND target.`~id` IN $ids
+            RETURN source.`~id` AS source_id, target.`~id` AS target_id,
                    type(r) AS relationship_name, properties(r) AS properties
             """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if edge_types:
allowed = "|".join(edge_types)
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[:{allowed}*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT id(neighbor) AS nid
"""
else:
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT id(neighbor) AS nid
"""
result = await self.query(path_query, {"node_ids": node_ids})
neighbor_ids = [record["nid"] for record in result if record.get("nid")]
all_ids = list(set(node_ids) | set(neighbor_ids))
# Step 2: Fetch all nodes
nodes_query = f"""
MATCH (n:{self._GRAPH_NODE_LABEL})
WHERE id(n) IN $ids
RETURN id(n) AS node_id, properties(n) AS properties
"""
nodes_result = await self.query(nodes_query, {"ids": all_ids})
nodes = [(r["node_id"], r["properties"]) for r in nodes_result]
# Step 3: Fetch all edges between collected nodes
edges_query = f"""
MATCH (source:{self._GRAPH_NODE_LABEL})-[r]->(target:{self._GRAPH_NODE_LABEL})
WHERE id(source) IN $ids AND id(target) IN $ids
RETURN id(source) AS source_id, id(target) AS target_id,
type(r) AS relationship_name, properties(r) AS properties
"""
edges_result = await self.query(edges_query, {"ids": all_ids})
if edge_types:
allowed = "|".join(edge_types)
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[:{allowed}*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT neighbor.`~id` AS nid
"""
else:
path_query = f"""
MATCH (seed:{self._GRAPH_NODE_LABEL})-[*1..{depth}]-(neighbor:{self._GRAPH_NODE_LABEL})
WHERE seed.`~id` IN $node_ids
RETURN DISTINCT neighbor.`~id` AS nid
"""
result = await self.query(path_query, {"node_ids": node_ids})
neighbor_ids = [record["nid"] for record in result if record.get("nid")]
all_ids = list(set(node_ids) | set(neighbor_ids))
# Step 2: Fetch all nodes
nodes_query = f"""
MATCH (n:{self._GRAPH_NODE_LABEL})
WHERE n.`~id` IN $ids
RETURN n.`~id` AS node_id, properties(n) AS properties
"""
nodes_result = await self.query(nodes_query, {"ids": all_ids})
nodes = [(r["node_id"], r["properties"]) for r in nodes_result]
# Step 3: Fetch all edges between collected nodes
edges_query = f"""
MATCH (source:{self._GRAPH_NODE_LABEL})-[r]->(target:{self._GRAPH_NODE_LABEL})
WHERE source.`~id` IN $ids AND target.`~id` IN $ids
RETURN source.`~id` AS source_id, target.`~id` AS target_id,
type(r) AS relationship_name, properties(r) AS properties
"""
edges_result = await self.query(edges_query, {"ids": all_ids})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/infrastructure/databases/graph/neptune_driver/adapter.py` around lines
690 - 725, get_neighborhood() mixes external node IDs (~id) and internal Neptune
ids (id(n)), causing mismatches; ensure the same ID domain is used throughout by
returning and filtering on the external id property (`~id`). Update the
path_query to RETURN neighbor.`~id` (collect into neighbor_ids), build all_ids
as union of node_ids and those neighbor `~id`s, change nodes_query to WHERE
n.`~id` IN $ids and RETURN n.`~id` AS node_id, and change edges_query to WHERE
source.`~id` IN $ids AND target.`~id` IN $ids and RETURN source.`~id` AS
source_id, target.`~id` AS target_id (keep function name get_neighborhood and
variables path_query, nodes_query, edges_query, all_ids, neighbor_ids, node_ids
to locate changes).

Comment on lines +280 to +291
if node_dimension < 1 or edge_dimension < 1:
raise InvalidDimensionsError()
try:
logger.info(f"Retrieving {depth}-hop neighborhood for {len(seed_node_ids)} seed nodes.")
nodes_data, edges_data = await adapter.get_neighborhood(
node_ids=seed_node_ids,
depth=depth,
edge_types=edge_types,
)

if not nodes_data or not edges_data:
raise EntityNotFoundError(message="Empty neighborhood projected from the database.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate neighborhood inputs and allow seed-only results.

project_neighborhood_from_db() currently forwards depth <= 0 and empty seed_node_ids straight to the adapter, and Line 290 also raises when the neighborhood contains seed nodes but no edges. That makes malformed requests and sparse-but-valid neighborhoods fail deep in the backend instead of producing a clear boundary behavior.

💡 One way to harden this path
         if node_dimension < 1 or edge_dimension < 1:
             raise InvalidDimensionsError()
+        if depth < 1:
+            raise ValueError("depth must be >= 1")
+        if not seed_node_ids:
+            raise ValueError("seed_node_ids must not be empty")
         try:
             logger.info(f"Retrieving {depth}-hop neighborhood for {len(seed_node_ids)} seed nodes.")
             nodes_data, edges_data = await adapter.get_neighborhood(
                 node_ids=seed_node_ids,
                 depth=depth,
                 edge_types=edge_types,
             )

-            if not nodes_data or not edges_data:
+            if not nodes_data:
                 raise EntityNotFoundError(message="Empty neighborhood projected from the database.")
+            edges_data = edges_data or []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/modules/graph/cognee_graph/CogneeGraph.py` around lines 280 - 291,
project_neighborhood_from_db currently forwards invalid inputs (depth <= 0 or
empty seed_node_ids) to the adapter and treats any empty edges_data as an error
even when nodes_data contains only the requested seeds; validate inputs early
and relax the empty-edge check: in project_neighborhood_from_db, before calling
adapter.get_neighborhood validate and raise a clear input error if depth < 1 or
seed_node_ids is empty (use InvalidDimensionsError or a new InvalidInputError),
then call adapter.get_neighborhood; after the call, only raise
EntityNotFoundError if nodes_data is empty (no nodes returned); allow edges_data
to be empty when nodes_data contains the requested seed_node_ids (i.e., accept
seed-only neighborhoods) and only treat missing edges as an error when your
logic expects at least one edge type to be present.

Comment on lines +55 to +56
neighborhood_depth: Optional[int] = None,
neighborhood_seed_top_k: Optional[int] = 10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently fall back to full-graph projection in neighborhood mode.

With neighborhood_depth set, Lines 68-88 still call project_graph_from_db() whenever relevant_ids_to_filter is empty/falsy. That makes the new flag a silent no-op and can turn a bounded neighborhood request back into a full-graph projection. Please either fail fast here or derive seed IDs before entering neighborhood mode.

Also applies to: 68-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cognee/modules/retrieval/utils/brute_force_triplet_search.py` around lines 55
- 56, The neighborhood_depth flag is being ignored when relevant_ids_to_filter
is falsy because the code calls project_graph_from_db() inside the neighborhood
branch; change the logic in brute_force_triplet_search (around the
neighborhood_depth check) so that if neighborhood_depth is set and
relevant_ids_to_filter is empty you either (A) fail fast by raising a ValueError
indicating seed IDs are required for neighborhood mode, or (B) compute/derive
seed IDs before entering neighborhood mode (e.g., call the existing
seed-derivation helper or add a new get_seed_ids function) and then proceed to
call project_graph_from_db() only with those seed IDs; ensure references to
neighborhood_seed_top_k and relevant_ids_to_filter are used to derive seeds if
you choose option B and do not fall back to full-graph projection silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant