-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Add Kafka health indicator #11515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kafka health indicator #11515
Conversation
- added kafka health indicator - added kafka health inidicator auto-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, it looks quite complete! I am a bit nervous adding the AdminClient
as part of this change. Is that a general purpose bean? If so, I guess we probably need a separate PR for that.
/** | ||
* Test for {@link KafkaHealthIndicator} | ||
*/ | ||
public class KafkaHealthIndicatorTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should end with Tests
.
@@ -138,4 +140,10 @@ public KafkaAdmin kafkaAdmin() { | |||
return kafkaAdmin; | |||
} | |||
|
|||
@Bean | |||
@ConditionalOnMissingBean(AdminClient.class) | |||
public AdminClient adminClient() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate feature that shouldn't be part of this change. Can you please create a separate PR for this?
} | ||
|
||
@Configuration | ||
@AutoConfigureBefore(KafkaHealthIndicatorAutoConfiguration.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't use @AutoConfigurationBefore
on something that's not an auto-configuration class. User configuration is processed before auto-configuration anyway so that's not needed at all.
import org.springframework.boot.context.properties.ConfigurationProperties; | ||
|
||
/** | ||
* External configuration properties for {@link KafkaHealthIndicator}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the word "external" is supposed to mean here. You can inspire yourself from other *Properties
type in the same module.
- code review changes
@Jcamilorada thanks for asking, we have a general issue with our CI pipeline and nothing you can do I am afraid. I'll review and build locally. |
BTW, one observation - if the producer is configured to support transactions ( I think this would incorrectly report UP. |
@garyrussell is there a way to avoid that? Or maybe we should implement the "ping" feature differently then? |
I just ran this test... @SpringBootApplication
public class Boot11515Application {
public static void main(String[] args) {
SpringApplication.run(Boot11515Application.class, args);
}
@Bean
public ApplicationRunner runner(KafkaAdmin admin, KafkaProperties kafkaProps) {
return args -> {
AdminClient client = AdminClient.create(admin.getConfig());
try {
DescribeClusterOptions options = new DescribeClusterOptions().timeoutMs(60_000);
DescribeClusterResult cluster = client.describeCluster(options);
System.out.println(cluster.clusterId().get());
if (StringUtils.hasText(kafkaProps.getProducer().getTransactionIdPrefix())) {
KafkaFuture<Collection<Node>> nodesFuture = cluster.nodes();
Collection<Node> nodes = nodesFuture.get();
System.out.println(nodes.size());
if (nodes.size() < 3) {
System.out.println("Insufficient active nodes for transactions");
}
}
}
finally {
client.close();
}
};
}
} with
And it works (detects < 3 active brokers). However, the That said, it could be a transient issue... When I run the spring-kafka transaction test cases, with only one broker, I get this error...
|
- refactor code to create and destroy admin client when health monitoring
@snicoll @garyrussell just update pr according to your suggestions. About health monitoring when transaction support I did some testing and replication factor can be obtained using:
So if you think is appropriated either I can add support for transaction as part of this issue or in another one. |
Cool. I'll leave it to @snicoll as to whether you should add it to this or create another. |
Let's do this here. If there aren't enough brokers, it would be nice to state so in the detail then. |
@Jcamilorada Am I right to expect an update of this PR for the thing we've discussed above? Thanks! |
@snicoll really sorry for late replay, just comeback for holidays, working in changes from today, so I will update pr asap. thanks! |
- added support for replication factor
- fixed coding style
* | ||
* @author Juan Rada | ||
*/ | ||
public class KafkaHealthIndicatorTests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one can argue that this is an IntegrationTest .
@snicoll @garyrussell pr have been updated to validate replication factor. thanks! |
* pr/11515: Polish "Add Kafka health indicator" Add Kafka health indicator
@Jcamilorada this now merged with a polish commit, thank you. |
Just as a small heads up, this change lead to our app not booting anymore with RC2 (we are using |
ping @garyrussell |
FYI, I just ran a test (with a 1.0.0 broker) and it worked as expected... @SpringBootApplication
public class Bgh11515Application {
public static void main(String[] args) {
SpringApplication.run(Bgh11515Application.class, args);
}
@KafkaListener(id = "bgh11515", topics = "bgh11515")
public void listen(String in) {
System.out.println(in);
}
} However, running with an OLD broker (0.10.2.0), I get DOWN and
I am not sure what the "right" thing to do is there; since the application is functional. Maybe the healthcheck should detect the |
@garyrussell depends if we are supposed to support that broker or not. But we can catch that exception and switch the status to |
@snicoll We use the 1.0.0 client but it can talk to older brokers, so we might not want to restrict their use. I was thinking a try/catch in BTW, when I killed kafka, I got... |
We are using Confluent 4.0.0 (which is Kafka 1.0.0) and are getting the following log output in kind of a loop (no errors, also please note, that we are running this with containers in a docker-compose setup):
We are basically using the following Kafka dependencies:
|
@kiview we'd really need a sample for this one. I am not sure I understand why several admins are created and how it relates to the health check. Perhaps you have an agressive invocation of the health endpoint somewhere? |
Looks like a health check is being performed every 5 seconds; who/what is doing that? I see the same in my logs each time I hit the endpoint (kafka is a bit verbose at INFO level). What do you see if if you hit the health actuator in a browser? |
@snicoll a new |
@kiview Can you explain exactly what you mean by "hanging"? A couple of the polls were done on the same thread so it doesn't look like a "hang" in the indicator. It looks like the indicator is working ok (although we don't know if it's reporting UP or DOWN) - is there something in your app that's polling the indicator and preventing the app from starting if it reports DOWN? |
Okay, now some things make more sense, thanks for your explanations @garyrussell and @snicoll.
Since the endpoint is reporting DOWN, Docker is constantly polling the endpoint. How can I enable additional output for the DOWN reason, like you did in our above example? I'll try to come up with a small example next week, if I can reproduce it easily. |
See my comment above - if you log the output of the I just submitted a PR to use a single |
Here are the results from a test I ran; the broker is not considered "UP" until there are enough nodes to support the configured replication factor...
|
@garyrussell Thanks for getting back, my curl of the health endpoint is missing the Okay, your explanation makes sense, which topic do you use for determining the needed replication factor? I assume here lies the problem with our app (don't have the project in front of me right now, I'll look into it after the weekend). |
Here's another occurrence : https://stackoverflow.com/questions/48965775/spring-boot-2-0-0-rc2-kafkahealthindicator-actuator-status-down/48966640#48966640 - the workaround in my answer fixed it for me. It's the We added it because, if you are using transactions, connections hang until there are enough brokers. @snicoll I wonder if we should either disable the replication check, or somehow only do it if the producer config enables transations? |
Hi All, unfortunately, as you may see from SO topic, I was unable to fix the issue with Maybe I'm doing something wrong or maybe the issue is somewhere else. |
@garyrussell We aren't using transactions so yes, in our case this doesn't really make sense. I think it would be better to have more defensive defaults for the autoconfigured endpoint. |
I just added another commit to the PR to only check if transaction are enabled; let's see what @snicoll thinks. |
I agree we should be more defensive and was about to ask @garyrussell about that actually. |
Yeah. FYI. the reason we need something if you are using transactions is because the kafka client (currently) simply hangs if there are not enough brokers available KAFKA-6446; no timeouts, nothing. |
Thanks all for the feedback. Looking at the issues and how to fix them properly, it became quite apparent that the feature isn't mature enough for us to consider adding an health indicator out of the box for Kafka. We've decided to revert what we have for 2.0 (see #12225) |
solves 11435