Skip to content

Data Streams messages - support Avro & Protobuf formats #20862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

piochelepiotr
Copy link
Contributor

@piochelepiotr piochelepiotr commented Jul 25, 2025

What does this PR do?

For users using Avro & Protobuf, this PR allows them to retrieve messages from Kafka via the Datadog UI:
image

Avro support is pretty straight forward: The Avro schema is in Json format.
For Protobuf, the schema is the descriptor file, and in order to pass it via remote configuration & as a configuration to the Kafka Consumer integration, it is base64 encoded on top of it.
For Protobuf, it is assumed that the first object in the pool is the parent schema.

Increased Datadog agent size

https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1051663772

Image size on disk (uncompressed image size) 1029868244 is higher than the maximum allowed 1029491916 by the gate !

So assuming the limit was close to the size before, that's an increase of 350Kb

Motivation

Most Kafka users are using Avro & Protobuf formats. With support for both of these, most users should be covered.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Jul 25, 2025

Codecov Report

❌ Patch coverage is 86.95652% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.19%. Comparing base (7709840) to head (80ab003).
⚠️ Report is 16 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
kafka_consumer 89.93% <86.95%> (+0.99%) ⬆️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@piochelepiotr piochelepiotr marked this pull request as ready for review July 28, 2025 15:32
@piochelepiotr piochelepiotr requested a review from a team as a code owner July 28, 2025 15:32
b'\x08\xe8\xba\xb2\xeb\xd1\x9c\x02\x12\x1b\x54\x68\x65\x20\x47\x6f\x20\x50\x72\x6f\x67\x72\x61\x6d\x6d\x69\x6e\x67\x20\x4c\x61\x6e\x67\x75\x61\x67\x65'
b'\x1a\x0c\x41\x6c\x61\x6e\x20\x44\x6f\x6e\x6f\x76\x61\x6e'
)
parsed_protobuf_schema = build_schema('protobuf', protobuf_schema)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema is in bytes here but I believe the remote config will have string representation. Will there be an issue with that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified offline

Copy link

@gitstevenpham gitstevenpham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@steveny91
Copy link
Contributor

Looks good to me. Maybe in the future it might be worth it to split off into a separate module/file for easier maintenance.

@piochelepiotr
Copy link
Contributor Author

Looks good to me. Maybe in the future it might be worth it to split off into a separate module/file for easier maintenance.

Can definitely do that! I will want to add payload filtering next. I will make sure to split the code up in modules when I work on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants