-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Collections of maps with different key sets are converted to indexed structs instead of arrays
Problem
When CompactValueConverter encounters a collection containing maps with different key sets, it converts the collection into an indexed struct ({e0: {...}, e1: {...}}) instead of preserving it as an array.
This affects any Cypher source query that uses collect() over maps where optional relationships or conditional logic produces elements with varying structures.
Reproduction
Case 1: Key absent in some elements
val coll = listOf(
mapOf("name" to "john", "age" to 21),
mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)Expected: schema.type() is ARRAY with a merged element schema containing name, age, and optional address.
Actual: schema.type() is STRUCT with indexed fields e0, e1.
Case 2: Key present but value is null
val coll = listOf(
mapOf("name" to "john", "age" to 21, "address" to null),
mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)Expected: Same as Case 1 — ARRAY with optional address.
Actual: Fails at value conversion with DataException: Invalid value: null used for required field: "address", schema type: STRUCT.
Case 2 is the more common scenario in practice, as it arises from Cypher patterns like:
collect({ id: node.id, detail: CASE WHEN rel IS null THEN null ELSE {x: rel.x} END })
The map literal always includes the detail key, but with a null value when the optional relationship does not exist.
Root cause
In CompactValueConverter.schema(), two separate mechanisms cause schema divergence:
Case 1: Key absent — notNullOrEmpty() excludes null entries from key set
The Map branch uses notNullOrEmpty() to filter entries before schema inference. When a key is entirely absent from one element but present in another, the inferred schemas have different key sets. The Collection branch then compares schemas via toSet().size, finds multiple unique schemas, and falls back to the indexed struct {e0, e1, ...}.
Case 2: Key present with null value — schema(null) returns OPTIONAL_STRING
When a key IS present but its value is null, the Map branch's else case includes it in the schema by calling schema(null, ...), which returns SimpleTypes.NULL.schema(true) — an OPTIONAL_STRING_SCHEMA. If the same key in another element has a non-null structured value (producing a STRUCT schema), the two schemas have the same key set but incompatible types (STRING vs STRUCT) for that field.
This is worse than Case 1: schema merging cannot resolve the type conflict, but the schemas also cannot be treated as identical. In older versions without merging, this falls back to {e0, e1}. With naive merging that does not account for null schemas, it either falls back or (if a different code path infers the schema from a single element) produces a schema with address as a required STRUCT, causing DataException at value conversion time when a null value is encountered.
Both cases
The force-maps-as-struct setting does not help in either case, as it only affects maps with homogeneous value types, not collections with heterogeneous element schemas.
Proposed fix
1. Schema merging for collections with differing key sets
When the Collection branch encounters multiple distinct STRUCT schemas, attempt to merge them before falling back to the indexed struct format:
- Check that all schemas are of type
STRUCT - Compute the union of all field names across schemas
- For each field, resolve the type from schemas that have it
- Fields present in only a subset of schemas are marked as optional
- If all field types are compatible (same type, or recursively mergeable structs), return a merged schema and use it as the array element schema
- If types conflict (e.g., same field is
STRINGin one andINT64in another), fall back to the existing{e0, e1, ...}behavior
2. Null schema compatibility
During schema merging, the null schema (SimpleTypes.NULL.schema(true) = OPTIONAL_STRING_SCHEMA) is treated as compatible with any other type. When a field has a null schema in some elements and a concrete type in others:
- The concrete type is used as the resolved schema
- The field is marked as optional (since null indicates the value may be absent)
If all elements have null schema for a field, the null schema itself is used (as optional).
Backward compatibility
Both changes are backward-compatible:
- Incompatible schemas (e.g., same field is
INT64in one element andSTRUCTin another) still fall back to the existing{e0, e1, ...}behavior - No changes to
value()are needed — the existing Map-to-Struct conversion iterates schema fields and looks up values by key. Missing keys and null values returnnull, which is valid for optional fields.
Workaround
Without this fix, users must ensure all collection elements have identical key sets and types by using coalesce to provide typed default values:
collect({
id: node.id,
detail: { x: coalesce(rel.x, 0), y: coalesce(rel.y, '') }
})This preserves the array format but introduces empty/zero defaults for null values, which may cause issues in downstream systems (e.g., Elasticsearch date field parsing, enum validation).