Skip to content

Dekaf: namespace group IDs and invalidate topic state when binding is backfilled or collection is reset #2093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jshearer
Copy link
Contributor

@jshearer jshearer commented Apr 22, 2025

Add advanced.namespaced_ids to Dekaf endpoint config

This flag will cause topics and groups to be namespaced when reported to/from upstream Kafka. This fixes a few problems described in #2060

Specifically:

  • Stop passing token to to_upstream_topic_name, as the net effect was breaking your consumer group if you changed your token.
  • Namespacing group IDs to the materialization so that multiple consumers can share the same group ID without worrying about conflicts
    • Note: This doesn't save you from having multiple consumers of the same task step on eachother's toes, like we saw happen last week. This is still a thing that can happen, and I haven't been able to think of a good way to mitigate it. Just don't run multiple consumers of the same task with the same group ID. Tinybird tells you not to, and we should probably do the same in our docs.
  • Include the partition template name and binding state key in the mapped topic names. This causes backfills and collection resets to propagate to consumers as a fresh topic with no committed offsets.
Post-deploy steps:

Setting advanced.namespaced_ids to true will cause all of your committed offset state to be effectively reset, so it's set to false by default if not specified. After this change is deployed, we'll want to manually go and publish all Dekaf materializations with this field explicitly set to false. Then we can change its default value to true, and the net effect will be that all new Dekaf materializations will have this turned on by default, leaving existing ones alone.

Other changes

We currently have a couple of consumers of materializations where the token changed after the group/topics had offsets committed. This is causing sessions to panic, and while we're actually handling that correctly because they're each an isolated Tokio task, I realized that we are "leaking" dekaf_total_connections -- we increment the counter when the connection is established, but panic before we can decrement it. So I refactored the way we report that metric to handle panic'd sessions.

I also fixed the panic by removing a couple of unwrap()s which I thought wouldn't fail, but do in this circumstance.

Fixes #2083, fixes #2060


This change is Reviewable

@jshearer jshearer force-pushed the dekaf/topic_name_translation_falliable branch 3 times, most recently from abb05dc to f57382b Compare April 24, 2025 19:16
@jshearer jshearer force-pushed the dekaf/topic_name_translation_falliable branch from f57382b to 9bc1da2 Compare May 1, 2025 14:58
@jshearer jshearer self-assigned this May 12, 2025
jshearer added 2 commits May 16, 2025 15:16
We've seen cases where this has panic'd in prod and it would be useful to know the topic name that it's failing with.
If `serve()` panics, it shouldn't result in an ever-growing `dekaf_total_connections` metric. So instead of directly incrementing/decrementing the connection counter, let's just periodically observe the semaphore's state and report that instead.
@jshearer jshearer force-pushed the dekaf/topic_name_translation_falliable branch from 9bc1da2 to 32bd5cc Compare May 16, 2025 21:03
This flag will cause topics and groups to be namespaced when reported to/from upstream Kafka. This fixes a few problems described in #2060

Specifically:
* Stop passing `token` to `to_upstream_topic_name`, as the net effect was breaking your consumer group if you changed your token.
* Namespacing group IDs to the materialization so that multiple consumers of different tasks can share the same group ID without worrying about conflicts
* Include the partition template name and binding state key in the mapped topic names. This causes backfills and collection resets to propagate to consumers as a fresh topic with no committed offsets.

Enabling `namespaced_ids` will cause all of your committed offset state to be effectively reset, so it's disabled by default. After this change is deployed, we'll want to manually go and publish all Dekaf materializations with this field explicitly disabled. Then we can change its default to enabled, and the net effect will be that all _new_ Dekaf materializations will have this turned on by default, leaving existing ones alone.
@jshearer jshearer force-pushed the dekaf/topic_name_translation_falliable branch from 32bd5cc to e59af8a Compare May 19, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dekaf: Transform group IDs to prevent cross-task conflicts dekaf: Handle reset scenarios (backfill, collection delete/recreate, etc)
1 participant