Skip to content

Collections of maps with different key sets are converted to indexed structs instead of arrays #527

@ogawa-takeshi

Description

@ogawa-takeshi

Collections of maps with different key sets are converted to indexed structs instead of arrays

Problem

When CompactValueConverter encounters a collection containing maps with different key sets, it converts the collection into an indexed struct ({e0: {...}, e1: {...}}) instead of preserving it as an array.

This affects any Cypher source query that uses collect() over maps where optional relationships or conditional logic produces elements with varying structures.

Reproduction

Case 1: Key absent in some elements

val coll = listOf(
  mapOf("name" to "john", "age" to 21),
  mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)

Expected: schema.type() is ARRAY with a merged element schema containing name, age, and optional address.

Actual: schema.type() is STRUCT with indexed fields e0, e1.

Case 2: Key present but value is null

val coll = listOf(
  mapOf("name" to "john", "age" to 21, "address" to null),
  mapOf("name" to "jane", "age" to 25, "address" to mapOf("city" to "london", "zip" to 10001))
)
val schema = converter.schema(coll, false)

Expected: Same as Case 1 — ARRAY with optional address.

Actual: Fails at value conversion with DataException: Invalid value: null used for required field: "address", schema type: STRUCT.

Case 2 is the more common scenario in practice, as it arises from Cypher patterns like:

collect({ id: node.id, detail: CASE WHEN rel IS null THEN null ELSE {x: rel.x} END })

The map literal always includes the detail key, but with a null value when the optional relationship does not exist.

Root cause

In CompactValueConverter.schema(), two separate mechanisms cause schema divergence:

Case 1: Key absent — notNullOrEmpty() excludes null entries from key set

The Map branch uses notNullOrEmpty() to filter entries before schema inference. When a key is entirely absent from one element but present in another, the inferred schemas have different key sets. The Collection branch then compares schemas via toSet().size, finds multiple unique schemas, and falls back to the indexed struct {e0, e1, ...}.

Case 2: Key present with null value — schema(null) returns OPTIONAL_STRING

When a key IS present but its value is null, the Map branch's else case includes it in the schema by calling schema(null, ...), which returns SimpleTypes.NULL.schema(true) — an OPTIONAL_STRING_SCHEMA. If the same key in another element has a non-null structured value (producing a STRUCT schema), the two schemas have the same key set but incompatible types (STRING vs STRUCT) for that field.

This is worse than Case 1: schema merging cannot resolve the type conflict, but the schemas also cannot be treated as identical. In older versions without merging, this falls back to {e0, e1}. With naive merging that does not account for null schemas, it either falls back or (if a different code path infers the schema from a single element) produces a schema with address as a required STRUCT, causing DataException at value conversion time when a null value is encountered.

Both cases

The force-maps-as-struct setting does not help in either case, as it only affects maps with homogeneous value types, not collections with heterogeneous element schemas.

Proposed fix

1. Schema merging for collections with differing key sets

When the Collection branch encounters multiple distinct STRUCT schemas, attempt to merge them before falling back to the indexed struct format:

  1. Check that all schemas are of type STRUCT
  2. Compute the union of all field names across schemas
  3. For each field, resolve the type from schemas that have it
  4. Fields present in only a subset of schemas are marked as optional
  5. If all field types are compatible (same type, or recursively mergeable structs), return a merged schema and use it as the array element schema
  6. If types conflict (e.g., same field is STRING in one and INT64 in another), fall back to the existing {e0, e1, ...} behavior

2. Null schema compatibility

During schema merging, the null schema (SimpleTypes.NULL.schema(true) = OPTIONAL_STRING_SCHEMA) is treated as compatible with any other type. When a field has a null schema in some elements and a concrete type in others:

  • The concrete type is used as the resolved schema
  • The field is marked as optional (since null indicates the value may be absent)

If all elements have null schema for a field, the null schema itself is used (as optional).

Backward compatibility

Both changes are backward-compatible:

  • Incompatible schemas (e.g., same field is INT64 in one element and STRUCT in another) still fall back to the existing {e0, e1, ...} behavior
  • No changes to value() are needed — the existing Map-to-Struct conversion iterates schema fields and looks up values by key. Missing keys and null values return null, which is valid for optional fields.

Workaround

Without this fix, users must ensure all collection elements have identical key sets and types by using coalesce to provide typed default values:

collect({
  id: node.id,
  detail: { x: coalesce(rel.x, 0), y: coalesce(rel.y, '') }
})

This preserves the array format but introduces empty/zero defaults for null values, which may cause issues in downstream systems (e.g., Elasticsearch date field parsing, enum validation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions