What happened?
Two related bugs affect PROTOBUF schemas registered as a base64-encoded FileDescriptorProto binary — the format produced by Confluent Java, C#, and .NET SDK serialisers.
Bug 1 — map_entry is silently stripped.
When a binary schema is registered, _deserialize_msg() in src/karapace/core/protobuf/serialization.py constructs MessageElement without reading MessageOptions. The option map_entry = true that marks a synthetic entry message as belonging to a map<K,V> field is silently dropped regardless of what the binary contained. Without it, consumers cannot distinguish a map field from a plain repeated message field.
Bug 2 — ?normalize=true does not restore map<K,V> shorthand.
Even if Bug 1 is fixed, normalisation leaves the expanded entry-message form in the output rather than converting it back to map<K,V> syntax. Confluent Schema Registry's ?normalize=true produces clean map<string, string> syntax; Karapace produces repeated .Foo.LabelsEntry labels = 1 with a dangling LabelsEntry nested message.
Both behaviours are deviations from Confluent Schema Registry. Schemas registered as plain .proto text strings are not affected — only the binary FileDescriptorProto registration path triggers these bugs.
Reproducer:
# Register a binary FileDescriptorProto containing a map<string,string> field
# (this is the format KafkaProtobufSerializer sends at runtime)
curl -s -X POST http://localhost:8081/subjects/map-test-value/versions \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
-d '{
"schemaType": "PROTOBUF",
"schema": "ImQKCk1hcE1lc3NhZ2USJwoGbGFiZWxzGAEgAygLMhcuTWFwTWVzc2FnZS5MYWJlbHNFbnRyeRotCgtMYWJlbHNFbnRyeRILCgNrZXkYASABKAkSDQoFdmFsdWUYAiABKAk6AjgBYgZwcm90bzM="
}'
# Bug 1 — fetch without normalization: map_entry is absent from LabelsEntry
curl -s http://localhost:8081/subjects/map-test-value/versions/latest | jq -r .schema
# Bug 2 — fetch with normalization: entry-message form not converted to map<>
curl -s "http://localhost:8081/subjects/map-test-value/versions/latest?normalize=true" | jq -r .schema
Actual output (both with and without ?normalize=true):
message MapMessage {
repeated .MapMessage.LabelsEntry labels = 1;
message LabelsEntry {
string key = 1;
string value = 2;
}
}
What did you expect to happen?
Karapace should match Confluent Schema Registry behaviour:
Without ?normalize=true — map_entry = true preserved in the entry message:
message MapMessage {
repeated .MapMessage.LabelsEntry labels = 1;
message LabelsEntry {
option map_entry = true;
string key = 1;
string value = 2;
}
}
With ?normalize=true — expanded form converted back to map<K,V> shorthand:
message MapMessage {
map<string, string> labels = 1;
}
What else do we need to know?
Root causes:
-
Bug 1 — _deserialize_msg() never reads DescriptorProto.options, so MessageOptions (including map_entry) is always discarded. The same gap in _serialize_msgtype() means binary round-trips also lose the option.
-
Bug 2 — message_element_with_sorted_options() in proto_normalizations.py only sorts options; it has no logic to detect the entry-message pattern and convert it back to map<K,V>. Confluent SR's normaliser explicitly detects messages with map_entry = true and emits map<K,V> syntax instead.
Impact:
- Code generators (
protoc, grpc_tools) receiving the schema from the registry will generate List<LabelsEntry> instead of Map<String, String>, causing API mismatches at runtime.
- Reflection-based tools that rely on
map_entry: true to identify map fields will misclassify them.
- Schema compatibility checks may incorrectly flag a map field against its expanded form as a type change.
?normalize=true produces output that differs from Confluent SR for all binary-registered schemas containing map fields, breaking schema equivalence checks between the two implementations.
What happened?
Two related bugs affect PROTOBUF schemas registered as a base64-encoded
FileDescriptorProtobinary — the format produced by Confluent Java, C#, and .NET SDK serialisers.Bug 1 —
map_entryis silently stripped.When a binary schema is registered,
_deserialize_msg()insrc/karapace/core/protobuf/serialization.pyconstructsMessageElementwithout readingMessageOptions. Theoption map_entry = truethat marks a synthetic entry message as belonging to amap<K,V>field is silently dropped regardless of what the binary contained. Without it, consumers cannot distinguish a map field from a plainrepeatedmessage field.Bug 2 —
?normalize=truedoes not restoremap<K,V>shorthand.Even if Bug 1 is fixed, normalisation leaves the expanded entry-message form in the output rather than converting it back to
map<K,V>syntax. Confluent Schema Registry's?normalize=trueproduces cleanmap<string, string>syntax; Karapace producesrepeated .Foo.LabelsEntry labels = 1with a danglingLabelsEntrynested message.Both behaviours are deviations from Confluent Schema Registry. Schemas registered as plain
.prototext strings are not affected — only the binaryFileDescriptorProtoregistration path triggers these bugs.Reproducer:
Actual output (both with and without
?normalize=true):What did you expect to happen?
Karapace should match Confluent Schema Registry behaviour:
Without
?normalize=true—map_entry = truepreserved in the entry message:With
?normalize=true— expanded form converted back tomap<K,V>shorthand:What else do we need to know?
Root causes:
Bug 1 —
_deserialize_msg()never readsDescriptorProto.options, soMessageOptions(includingmap_entry) is always discarded. The same gap in_serialize_msgtype()means binary round-trips also lose the option.Bug 2 —
message_element_with_sorted_options()inproto_normalizations.pyonly sorts options; it has no logic to detect the entry-message pattern and convert it back tomap<K,V>. Confluent SR's normaliser explicitly detects messages withmap_entry = trueand emitsmap<K,V>syntax instead.Impact:
protoc,grpc_tools) receiving the schema from the registry will generateList<LabelsEntry>instead ofMap<String, String>, causing API mismatches at runtime.map_entry: trueto identify map fields will misclassify them.?normalize=trueproduces output that differs from Confluent SR for all binary-registered schemas containing map fields, breaking schema equivalence checks between the two implementations.