Skip to content

Conversation

appchemist
Copy link
Contributor

That PR is a POC.

For now, I only focused on modifying KStreamImpl.repartitionRequired to replace it with GraphNode.keyChangingOperation.

So I'll be working on it further to make sure it's the right fix and need to write some test code.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added triage PRs from the community streams labels Feb 4, 2025
@appchemist
Copy link
Contributor Author

Hi, @mjsax
If you have a moment, Please take a look

@mjsax
Copy link
Member

mjsax commented Feb 5, 2025

Thanks for this draft -- might take a few days until I find time to take a look.

@github-actions github-actions bot removed the triage PRs from the community label Feb 6, 2025
@appchemist
Copy link
Contributor Author

appchemist commented Feb 17, 2025

Hi, @mjsax
After reuse InternalStreamsBuilder#getKeyChangingParentNode(). Some Tests are failed.
So, I'll looking for these failed tests
I haven't had much time lately, so I haven't looked at it much.

After Analyzing the cause, I'll request the review again

However, I would appreciate it if you could take a look at the following comment when you have time.
#18800 (comment)

@github-actions github-actions bot added the small Small PRs label Feb 22, 2025
@appchemist
Copy link
Contributor Author

appchemist commented Feb 22, 2025

:streams:testAll now succeed.

@ mjsax
If you have a moment, Please take a look.

If the POC is worth applying, I'll resolve the conflict.

@github-actions github-actions bot removed the small Small PRs label Feb 26, 2025
@appchemist
Copy link
Contributor Author

@mjsax kindly ping

@mjsax
Copy link
Member

mjsax commented Jun 11, 2025

@appchemist -- Sorry for dropping the ball on this work. -- I would have time now, to support you with with PR, if you are still interested.

@appchemist
Copy link
Contributor Author

appchemist commented Jun 12, 2025

@mjsax Of course, I'm still interested.
Thank you for your time.

@appchemist
Copy link
Contributor Author

appchemist commented Jun 15, 2025

I've resolve the conflict.
./gradlew clean :streams:testAll is all passed

… GraphNode "keyChangingOperation"

- refactoring
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Finally made a pass. Hope the comments/question make sense and help.


private enum Repartition {
NOT_REQUIRED,
BY_KEY_ONLY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of BY_KEY is that repartitioning is determined based on the keyChangingOperation of the GraphNode.

@@ -100,14 +100,14 @@ public <K, V> KStream<K, V> stream(final Collection<String> topics,

final String name = new NamedInternal(consumed.name()).orElseGenerateWithPrefix(this, KStreamImpl.SOURCE_NAME);
final StreamSourceNode<K, V> streamSourceNode = new StreamSourceNode<>(name, topics, consumed);
streamSourceNode.requireRepartitionByKey();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means? Ie, the name and semantics are not clear to me. When we add a new KStream from a topic, the assumption is, that the KStream is partitioned by key. So this operator does not "require" any repartitioning (it just reads from a topic), and it does also not change the key (so downstream repartitioning is also not required as it's not a key changing operation).

Copy link
Contributor Author

@appchemist appchemist Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.
The original intent was streamSourceNode.requireNotRepartition(), but it was incorrectly modified during the refactoring process.

@@ -116,14 +116,14 @@ public <K, V> KStream<K, V> stream(final Pattern topicPattern,
final ConsumedInternal<K, V> consumed) {
final String name = new NamedInternal(consumed.name()).orElseGenerateWithPrefix(this, KStreamImpl.SOURCE_NAME);
final StreamSourceNode<K, V> streamPatternSourceNode = new StreamSourceNode<>(name, topicPattern, consumed);
streamPatternSourceNode.requireRepartitionByKey();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above.

@@ -614,6 +614,14 @@ private GraphNode getKeyChangingParentNode(final GraphNode repartitionNode) {
return null;
}

protected boolean isRepartitionRequired(final GraphNode node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to know, if node requires repartitioning, why do we add this method here? It seems to belong to GraphNode class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
I overlooked it while focusing on reusing the InternalStreamsBuilder.findParentNodeMatching.
I also moved InternalStreamsBuilder.findParentNodeMatching to GraphNode, as it seemed more appropriate there.

subTopologySourceNodes,
name,
graphNode
);
}

public boolean repartitionRequired() {
return builder.isRepartitionRequired(graphNode);
Copy link
Member

@mjsax mjsax Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf my commend above. This could be graphNode.repartitionRequired() if we move the method. Or do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Agree.
Fixed it

@@ -93,6 +98,14 @@ public String nodeName() {
return nodeName;
}

public boolean canResolveRepartition() {
return keyChangingOperation || repartition != Repartition.NOT_REQUIRED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method and it's logic is not clear to me? What does "resolve" actually mean?

Copy link
Contributor Author

@appchemist appchemist Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It indicates whether the node is able to determine if repartitioning is needed.
I renamed it to canDetermineRepartition for better clarity.

Also, the original intent was keyChangingOperation || repartition != Repartition.BY_KEY, but it was incorrectly modified during the refactoring process.

return keyChangingOperation || repartition != Repartition.NOT_REQUIRED;
}

public boolean isRepartitionRequired() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to repartitioningRequired() or requiresRepartitioning()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed the method to repartitioningRequired()

}

public boolean isRepartitionRequired() {
return keyChangingOperation || repartition == Repartition.ALWAYS_REQUIRED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key-changing operation by itself does not require repartitioning so not clear about the "or" condition. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find the explanation for why Repartition.REQUIRED is necessary in the linked reference.

@@ -105,6 +118,14 @@ public boolean isMergeNode() {
return mergeNode;
}

public void requireRepartitionByKey() {
this.repartition = Repartition.BY_KEY_ONLY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid unnecessary this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this

}

public void requireRepartitionAlways() {
this.repartition = Repartition.ALWAYS_REQUIRED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid unnecessary this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this

… GraphNode "keyChangingOperation"

- Corrected values in Repartition enum
- Refactoring
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants