-
Notifications
You must be signed in to change notification settings - Fork 14.6k
KAFKA-19668: processValue() must be declared as value-changing operation #20470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e0436ea
to
1742ead
Compare
stateStoreNames); | ||
stateStoreNames | ||
); | ||
if (builder.processProcessValueFixEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added this here -- it's atm not strictly necessary, because process()
is key-changing, what we track via repartitionRequired
flag further below, setting it to true
via KStreamImpl
constructor.
but it just feel correct to add anyway, as we have plans to actually unify repartitionRequired
, and keyChanging
flags. So we need to make sure we don't miss this one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cf #18800
\cc @appchemist for visibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@@ -386,20 +392,20 @@ private int getCountOfRepartitionTopicsFound(final String topologyString) { | |||
" Source: KSTREAM-SOURCE-0000000000 (topics: [retryTopic])\n" + | |||
" --> KSTREAM-PROCESSOR-0000000001\n" + | |||
" Processor: KSTREAM-PROCESSOR-0000000001 (stores: [])\n" + | |||
" --> KSTREAM-FILTER-0000000005\n" + | |||
" --> KSTREAM-AGGREGATE-STATE-STORE-0000000002-repartition-filter\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure, what triggers this re-naming, but I believe it's only in-memory processor names, so I believe it's ok? -- The structure of the topology does not change, and all stateful things (stores, topics) keep their names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should be, especially since this is a filter
we added some time ago a filter before repartitioning to filter records with a null
key
With "merge.repartition.topic" optimization enabled, Kafka Streams tries to push repartition topics upstream, to be able to merge multiple repartition topics from different downstream branches together. However, it is not safe to push a repartition topic if the parent node is value-changing: because of potentially changing data types, the topology might become invalid, and fail with serde error at runtime. The optimization itself work correctly, however, processValues() is not correctly declared as key-changing, what can lead to invalid topologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @mjsax - LGTM with one minor comment.
public static boolean getBoolean(final Map<String, Object> configs, final String key, final boolean defaultValue) { | ||
final Object value = configs.getOrDefault(key, defaultValue); | ||
if (value instanceof Boolean) { | ||
return (boolean) value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the cast required, wouldn't the JVM use autoboxing to covert between Boolean
and boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because the type of value
is Object
. We could also cast to Boolean
. So some auto-conversion/unboxing happens anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually c&p this from StreamsConfig.InternalConfigs
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
stateStoreNames | ||
); | ||
if (builder.processProcessValueFixEnabled()) { | ||
processNode.setValueChangingOperation(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Basically a one-liner fix.
Thanks both -- I am still wondering what changes we need to test this better? Eg, system tests? -- We also need more changes for the release noted / upgrade guide, and we might even want to go back to older docs, and also add information about this, as it's a problem since AK 3.3.0 release? Thoughts? |
Docs update for 4.0.1 release: #20484 |
…ion (#20470) With "merge.repartition.topic" optimization enabled, Kafka Streams tries to push repartition topics upstream, to be able to merge multiple repartition topics from different downstream branches together. However, it is not safe to push a repartition topic if the parent node is value-changing: because of potentially changing data types, the topology might become invalid, and fail with serde error at runtime. The optimization itself work correctly, however, processValues() is not correctly declared as value-changing, what can lead to invalid topologies. Reviewers: Bill Bejeck <[email protected]>, Lucas Brutschy <[email protected]>
…ion (#20470) With "merge.repartition.topic" optimization enabled, Kafka Streams tries to push repartition topics upstream, to be able to merge multiple repartition topics from different downstream branches together. However, it is not safe to push a repartition topic if the parent node is value-changing: because of potentially changing data types, the topology might become invalid, and fail with serde error at runtime. The optimization itself work correctly, however, processValues() is not correctly declared as value-changing, what can lead to invalid topologies. Reviewers: Bill Bejeck <[email protected]>, Lucas Brutschy <[email protected]>
Merged to |
With "merge.repartition.topic" optimization enabled, Kafka Streams tries
to push repartition topics upstream, to be able to merge multiple
repartition topics from different downstream branches together.
However, it is not safe to push a repartition topic if the parent node
is value-changing: because of potentially changing data types, the
topology might become invalid, and fail with serde error at runtime.
The optimization itself work correctly, however, processValues() is not
correctly declared as value-changing, what can lead to invalid
topologies.
Reviewers: Bill Bejeck [email protected], Lucas Brutschy
[email protected]