Skip to content

protobuf: binary-registered schemas silently drop map_entry, and ?normalize=true does not restore map<K,V> shorthand #1319

Description

@nathan-c

What happened?

Two related bugs affect PROTOBUF schemas registered as a base64-encoded FileDescriptorProto binary — the format produced by Confluent Java, C#, and .NET SDK serialisers.

Bug 1 — map_entry is silently stripped.

When a binary schema is registered, _deserialize_msg() in src/karapace/core/protobuf/serialization.py constructs MessageElement without reading MessageOptions. The option map_entry = true that marks a synthetic entry message as belonging to a map<K,V> field is silently dropped regardless of what the binary contained. Without it, consumers cannot distinguish a map field from a plain repeated message field.

Bug 2 — ?normalize=true does not restore map<K,V> shorthand.

Even if Bug 1 is fixed, normalisation leaves the expanded entry-message form in the output rather than converting it back to map<K,V> syntax. Confluent Schema Registry's ?normalize=true produces clean map<string, string> syntax; Karapace produces repeated .Foo.LabelsEntry labels = 1 with a dangling LabelsEntry nested message.

Both behaviours are deviations from Confluent Schema Registry. Schemas registered as plain .proto text strings are not affected — only the binary FileDescriptorProto registration path triggers these bugs.

Reproducer:

# Register a binary FileDescriptorProto containing a map<string,string> field
# (this is the format KafkaProtobufSerializer sends at runtime)
curl -s -X POST http://localhost:8081/subjects/map-test-value/versions \
  -H "Content-Type: application/vnd.schemaregistry.v1+json" \
  -d '{
    "schemaType": "PROTOBUF",
    "schema": "ImQKCk1hcE1lc3NhZ2USJwoGbGFiZWxzGAEgAygLMhcuTWFwTWVzc2FnZS5MYWJlbHNFbnRyeRotCgtMYWJlbHNFbnRyeRILCgNrZXkYASABKAkSDQoFdmFsdWUYAiABKAk6AjgBYgZwcm90bzM="
  }'

# Bug 1 — fetch without normalization: map_entry is absent from LabelsEntry
curl -s http://localhost:8081/subjects/map-test-value/versions/latest | jq -r .schema

# Bug 2 — fetch with normalization: entry-message form not converted to map<>
curl -s "http://localhost:8081/subjects/map-test-value/versions/latest?normalize=true" | jq -r .schema

Actual output (both with and without ?normalize=true):

message MapMessage {
  repeated .MapMessage.LabelsEntry labels = 1;

  message LabelsEntry {
    string key = 1;
    string value = 2;
  }
}

What did you expect to happen?

Karapace should match Confluent Schema Registry behaviour:

Without ?normalize=truemap_entry = true preserved in the entry message:

message MapMessage {
  repeated .MapMessage.LabelsEntry labels = 1;

  message LabelsEntry {
    option map_entry = true;
    string key = 1;
    string value = 2;
  }
}

With ?normalize=true — expanded form converted back to map<K,V> shorthand:

message MapMessage {
  map<string, string> labels = 1;
}

What else do we need to know?

Root causes:

  • Bug 1_deserialize_msg() never reads DescriptorProto.options, so MessageOptions (including map_entry) is always discarded. The same gap in _serialize_msgtype() means binary round-trips also lose the option.

  • Bug 2message_element_with_sorted_options() in proto_normalizations.py only sorts options; it has no logic to detect the entry-message pattern and convert it back to map<K,V>. Confluent SR's normaliser explicitly detects messages with map_entry = true and emits map<K,V> syntax instead.

Impact:

  • Code generators (protoc, grpc_tools) receiving the schema from the registry will generate List<LabelsEntry> instead of Map<String, String>, causing API mismatches at runtime.
  • Reflection-based tools that rely on map_entry: true to identify map fields will misclassify them.
  • Schema compatibility checks may incorrectly flag a map field against its expanded form as a type change.
  • ?normalize=true produces output that differs from Confluent SR for all binary-registered schemas containing map fields, breaking schema equivalence checks between the two implementations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions