feat: traversal-integrated groupBy for vector search (follow-up to #4067)

## Feature Request: Traversal-Integrated `groupBy` for Vector Search

Follow-up to [#4067](https://github.com/ArcadeData/arcadedb/issues/4067), which shipped the `groupBy` / `groupSize` options on `vector.neighbors` and `vector.sparseNeighbors` as an MVP using post-traversal filtering. This issue tracks the work to push the grouping into the index traversal proper.

### Background

The MVP delivered in #4067 implements `groupBy` as a post-traversal filter:

1. Over-fetch `k * groupSize * 5` candidates from JVector (dense) or the sparse-index top-K loop.
2. Iterate the candidates in rank order, keeping per-group counters.
3. Stop once `k` groups are filled at `groupSize` each.

This is correct and works well for typical RAG workloads (10s of groups, modest `k`). It has two limits:

- **Wasted work**: candidates that get filtered out were still scored and (for dense) included in HNSW graph traversal.
- **Loose recall guarantee**: when group cardinality is high or distribution is skewed, the 5x over-fetch multiplier may not be enough to fill `k` groups, and the user has to bump `efSearch` or accept fewer groups.

Qdrant's `query_points_groups` integrates grouping into the search loop; Milvus's `group_by_field` does the same. Both avoid the over-fetch heuristic.

### Design

#### Dense (`vector.neighbors`)

JVector exposes a per-point `Bits` filter and a top-K heap. The cleanest integration is a custom search-result collector that:

- Accepts the `groupBy` field reader (a `RID -> Object` lookup, typically `record.get("source_file")`).
- Maintains a per-group count and a per-group worst-score watermark.
- Admits a candidate to the heap only if `count[groupKey] < groupSize`, OR the candidate beats the group's worst member (in which case the worst member is evicted and replaced).
- Stops the HNSW traversal once every group in flight is full.

Two integration points to investigate in JVector:

1. **`SearchScoreProvider` + custom collector**: replace JVector's default heap with a group-aware one, called via the search loop's `collect` callback.
2. **`Bits` filter with adaptive bookkeeping**: the filter reads from a shared per-group counter table and rejects candidates whose group is already full and whose score wouldn't beat the group's worst member. Cheaper to implement; possibly less effective than option 1.

For the first iteration option 2 is probably enough.

#### Sparse (`vector.sparseNeighbors`)

`LSMSparseVectorIndex.topK` is our own code, so the integration is straightforward: the WAND DAAT loop already maintains a min-heap of size K. Replace it with a per-group min-heap and the same admission logic as above. Pivot upper bound becomes per-group: a candidate's RID can only enter a group's heap if its score can beat that group's worst member.

### Acceptance criteria

- [ ] `vector.neighbors` integrates grouping into the JVector search loop, not as a post-filter on a fixed over-fetch.
- [ ] `vector.sparseNeighbors` integrates grouping into the WAND DAAT loop in `LSMSparseVectorIndex.topK`.
- [ ] Identical results (RIDs and ordering) to the MVP post-filter implementation on existing tests in `SQLFunctionVectorGroupByTest`.
- [ ] Lower wall-clock latency than the MVP at `k=10, groupSize=1` on a 100k-doc / 1000-group benchmark.
- [ ] No regression in recall vs the MVP at fixed `efSearch`.
- [ ] Benchmarks vs the MVP on representative shapes (low / medium / high group cardinality).

### Out of scope

- Cross-collection grouping (Qdrant's `with_lookup`).
- Multi-key grouping (e.g. `groupBy: ['author', 'year']`).
- Group-level scoring (e.g. avg score per group as a tiebreaker).

### Related

- Parent: [#4067](https://github.com/ArcadeData/arcadedb/issues/4067) (groupBy MVP, shipped)
- Sibling follow-up: dotted-nested-field groupBy
- Discussion: [#4044](https://github.com/ArcadeData/arcadedb/discussions/4044)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: traversal-integrated groupBy for vector search (follow-up to #4067) #4071

Feature Request: Traversal-Integrated `groupBy` for Vector Search

Background

Design

Dense (`vector.neighbors`)

Sparse (`vector.sparseNeighbors`)

Acceptance criteria

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: traversal-integrated groupBy for vector search (follow-up to #4067) #4071

Description

Feature Request: Traversal-Integrated groupBy for Vector Search

Background

Design

Dense (vector.neighbors)

Sparse (vector.sparseNeighbors)

Acceptance criteria

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Feature Request: Traversal-Integrated `groupBy` for Vector Search

Dense (`vector.neighbors`)

Sparse (`vector.sparseNeighbors`)