Collections of maps with different key sets are converted to indexed structs instead of arrays

# Collections of maps with different key sets are converted to indexed structs instead of arrays

## Problem

When `CompactValueConverter` encounters a collection containing maps with different key sets, it converts the collection into an indexed struct (`{e0: {...}, e1: {...}}`) instead of preserving it as an array.

This affects any Cypher source query that uses `collect()` over maps where optional relationships or conditional logic produces elements with varying structures.

## Reproduction

### Case 1: Key absent in some elements

```kotlin
val coll = listOf(
  mapOf("name" to "john", "age" to 21),
  mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)
```

**Expected**: `schema.type()` is `ARRAY` with a merged element schema containing `name`, `age`, and optional `address`.

**Actual**: `schema.type()` is `STRUCT` with indexed fields `e0`, `e1`.

### Case 2: Key present but value is null

```kotlin
val coll = listOf(
  mapOf("name" to "john", "age" to 21, "address" to null),
  mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)
```

**Expected**: Same as Case 1 — `ARRAY` with optional `address`.

**Actual**: Fails at value conversion with `DataException: Invalid value: null used for required field: "address", schema type: STRUCT`.

Case 2 is the more common scenario in practice, as it arises from Cypher patterns like:

```
collect({ id: node.id, detail: CASE WHEN rel IS null THEN null ELSE {x: rel.x} END })
```

The map literal always includes the `detail` key, but with a null value when the optional relationship does not exist.

## Root cause

In `CompactValueConverter.schema()`, two separate mechanisms cause schema divergence:

### Case 1: Key absent — `notNullOrEmpty()` excludes null entries from key set

The Map branch uses `notNullOrEmpty()` to filter entries before schema inference. When a key is entirely absent from one element but present in another, the inferred schemas have different key sets. The Collection branch then compares schemas via `toSet().size`, finds multiple unique schemas, and falls back to the indexed struct `{e0, e1, ...}`.

### Case 2: Key present with null value — `schema(null)` returns OPTIONAL_STRING

When a key IS present but its value is null, the Map branch's `else` case includes it in the schema by calling `schema(null, ...)`, which returns `SimpleTypes.NULL.schema(true)` — an `OPTIONAL_STRING_SCHEMA`. If the same key in another element has a non-null structured value (producing a `STRUCT` schema), the two schemas have the same key set but **incompatible types** (`STRING` vs `STRUCT`) for that field.

This is worse than Case 1: schema merging cannot resolve the type conflict, but the schemas also cannot be treated as identical. In older versions without merging, this falls back to `{e0, e1}`. With naive merging that does not account for null schemas, it either falls back or (if a different code path infers the schema from a single element) produces a schema with `address` as a **required** `STRUCT`, causing `DataException` at value conversion time when a null value is encountered.

### Both cases

The `force-maps-as-struct` setting does not help in either case, as it only affects maps with homogeneous value types, not collections with heterogeneous element schemas.

## Proposed fix

### 1. Schema merging for collections with differing key sets

When the Collection branch encounters multiple distinct STRUCT schemas, attempt to **merge** them before falling back to the indexed struct format:

1. Check that all schemas are of type `STRUCT`
2. Compute the union of all field names across schemas
3. For each field, resolve the type from schemas that have it
4. Fields present in only a subset of schemas are marked as **optional**
5. If all field types are compatible (same type, or recursively mergeable structs), return a merged schema and use it as the array element schema
6. If types conflict (e.g., same field is `STRING` in one and `INT64` in another), fall back to the existing `{e0, e1, ...}` behavior

### 2. Null schema compatibility

During schema merging, the null schema (`SimpleTypes.NULL.schema(true)` = `OPTIONAL_STRING_SCHEMA`) is treated as compatible with any other type. When a field has a null schema in some elements and a concrete type in others:

- The concrete type is used as the resolved schema
- The field is marked as **optional** (since null indicates the value may be absent)

If all elements have null schema for a field, the null schema itself is used (as optional).

### Backward compatibility

Both changes are backward-compatible:

- Incompatible schemas (e.g., same field is `INT64` in one element and `STRUCT` in another) still fall back to the existing `{e0, e1, ...}` behavior
- No changes to `value()` are needed — the existing Map-to-Struct conversion iterates schema fields and looks up values by key. Missing keys and null values return `null`, which is valid for optional fields.

## Workaround

Without this fix, users must ensure all collection elements have identical key sets and types by using `coalesce` to provide typed default values:

```cypher
collect({
  id: node.id,
  detail: { x: coalesce(rel.x, 0), y: coalesce(rel.y, '') }
})
```

This preserves the array format but introduces empty/zero defaults for null values, which may cause issues in downstream systems (e.g., Elasticsearch date field parsing, enum validation).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collections of maps with different key sets are converted to indexed structs instead of arrays #527

Collections of maps with different key sets are converted to indexed structs instead of arrays

Problem

Reproduction

Case 1: Key absent in some elements

Case 2: Key present but value is null

Root cause

Case 1: Key absent — `notNullOrEmpty()` excludes null entries from key set

Case 2: Key present with null value — `schema(null)` returns OPTIONAL_STRING

Both cases

Proposed fix

1. Schema merging for collections with differing key sets

2. Null schema compatibility

Backward compatibility

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Collections of maps with different key sets are converted to indexed structs instead of arrays #527

Description

Collections of maps with different key sets are converted to indexed structs instead of arrays

Problem

Reproduction

Case 1: Key absent in some elements

Case 2: Key present but value is null

Root cause

Case 1: Key absent — notNullOrEmpty() excludes null entries from key set

Case 2: Key present with null value — schema(null) returns OPTIONAL_STRING

Both cases

Proposed fix

1. Schema merging for collections with differing key sets

2. Null schema compatibility

Backward compatibility

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Case 1: Key absent — `notNullOrEmpty()` excludes null entries from key set

Case 2: Key present with null value — `schema(null)` returns OPTIONAL_STRING