Make sure that starting a JMS connection does not block infinitely #10853

filiphr · 2017-10-31T21:13:11Z

Starting a JMS connection may block infinitely if the factory is not correctly setup.
For example if ActiveMQ with failover transport is used then the default connection
retries infinitely, which leads to ActiveMQ trying to establish a connection and
never succedding in case none of the configured brokers is up.

Fixes gh-10809

This is an initial idea to show how I would solve the blocking call. Maybe there are better ideas. I would also add something like the ElasticsearchHealthIndicatorProperties so users can actually specify the timeout. Using a default could be like the one from the Elasticsearch (or bigger). I was also not completely sure about the status. Should it be down, unknown? I am not sure that we can say with 100% certainty that it is down and the reason.

Starting a JMS connection may block infinitely if the factory is not correctly setup. For example if ActiveMQ with failover transport is used then the default connection retries infinitely, which leads to ActiveMQ trying to establish a connection and never succedding in case none of the configured brokers is up. Fixes spring-projectsgh-10809

filiphr · 2017-10-31T21:14:54Z

One more thing, I did it against master. However, I think the same could be done for earlier versions. Will it be OK if we do it for 1.5.x? I was not sure what our stance on such things is

philwebb · 2017-10-31T21:59:50Z

@filiphr It feels like quite a large change for 1.5.x. We're also limited in that we need to support Java 6 so CompletableFuture isn't an option.

filiphr · 2017-10-31T22:12:49Z

Didn't know you were all the way to 6 there. Maybe I can use a Future, or a Thread. In any case the change would be more or less similar. We need to run the start connection in its own Thread so we can block.

In case you are set only for 2.0 we can discuss how to proceed with it there only. For me important is, whether such an approach is acceptable

snicoll · 2017-11-01T06:13:43Z

@filiphr that's a large change for something that's arguably a configuration mistake. I wonder if we couldn't change our ActiveMQ default not to be infinite or ask the ActiveMQ team if they could reconsider the change.

filiphr · 2017-11-01T06:39:33Z

@snicoll I think there ia nothing you can change on your side. The spring boot defaults are ok. This happens when one actively uses failover transport without setting anything (just addresses). I can submit an issue for them. However, other vendors might be doing something like that as well.

Also keep in mind that if one does such configuration mistake the entire health endpoint is blocked. At first I didn't know why the endpoint was not working. I noticed that it blocks only because I was trying something with spring boot admin and they have a timeout on the request, which means that they report that the service is offline.

I am open to other ideas about improving this as well.

In the end it is completely up to you whether you want to have something like this. The elasticsearch endpoint has something similar, although they support it in their API.

snicoll · 2017-11-01T06:43:21Z

@filiphr the reactive side of things has such support already. I would make it generally purpose rather than hardcoding something for JMS is what I am trying to say.

filiphr · 2017-11-01T06:48:30Z

@snicoll ok then. I didn't understand that sorry.

Are there special reactive endpoints? This would also mean that you can invoke multiple endpoints in the same time. That is quite cool. Is there something that I can look into?

wilkinsona · 2017-11-01T10:27:39Z

Unfortunately, I don't think this approach, or an approach like it, is going to fix the problem. With this change in place, it looks to me like repeated calls to the health endpoint will now risk exhausting the resources of the common fork-join pool. I believe this applies to any solution that moves the call to start the connection onto another thread as the work being performed by that thread will never complete so the thread will never die or will never be returned to its pool.

I can submit an issue for them

@filiphr Please do raise an ActiveMQ issue. I can't think of a scenario where it's desirable for starting a connector to block forever and, as explained above, I don't think there's a robust way for us to work around that behaviour.

filiphr · 2017-11-01T11:25:23Z

@wilkinsona I'll create one.

As for the Thread nit dieing. Won't closing the connection make the connection start finish? Can't we kill the Thread somehow?

Also isn't the problem with the Thread never dieing also valid for the request Thread as well? As repeates calls to the endpoint will block always. How will that Thread be closed?

wilkinsona · 2017-11-01T12:04:36Z

Won't closing the connection make the connection start finish?

I don't know. That would require something on another thread to close the connection, and would also require the JMS provider's implementation of close and start to behave as we'd like.

Can't we kill the Thread somehow?

No, there's no safe way to kill or stop a thread.

Also isn't the problem with the Thread never dieing also valid for the request Thread as well?

Yes, it is. There's definitely a problem here, but the proposed solution is just moving the problem to another thread rather than solving it.

filiphr · 2017-11-04T08:54:37Z

@wilkinsona, @philwebb and @snicoll thanks a lot for the discussion. I completely understand all your points and I agree such a solution is not good.

I went through the JIRA of ActiveMQ and it seems that this blocking occurs only when the broker is not available on startup (see AMQ-5588 and AMQ-2114). This actually is the expected behaviour of ActiveMQ and reading through the issues it seems as it makes sense. Unfortunately this means that maybe there is nothing that could be done on the spring boot side.

I will test the reverse scenario, broker is up on startup, but goes down later on, to see what the health endpoint will do (whether it'll block or not, and whether it would be down or not).

If you have some ideas I am more than happy to try them out and test them out on my setup.

filiphr · 2017-11-06T10:01:47Z

I just checked and blocking occurs even after the first initial connection is OK. This means that no matter what happens the health endpoint will block if the default failover protocol with infinite connection retries is used.

wilkinsona · 2017-11-06T10:13:21Z

Thanks for doing some further research, @filiphr. Perhaps we need to do something different for ActiveMQ: http://activemq.apache.org/how-can-i-monitor-the-connection-with-the-broker.html?

filiphr · 2017-11-06T22:49:35Z

No problem @wilkinsona. I really wanted to solve this as it can be a big pain. Actually, what you linked is pretty good, I used something similar for a STOMP endpoint. The only thing which I am not quite sure is how to do it.

I was thinking on implementing an ActiveMQConnectionFactoryCustomizer where we would be able to set the TransportListener. The transport listener itself can be the health endpoint and we would have a state that we would monitor, based on which we would generate the endpoint. This can mean that the state might be unknown until an initial connection is started form the outside. I presume we would need another JmsHealthIndicatorAutoConfiguration that would be only for ActiveMQConnectionFactory

wilkinsona · 2017-11-11T18:50:21Z

@filiphr I'm not entirely show how we'd do it, but customising the connection factory to register a TransportListener is the sort of thing I had in mind. If you have some time to explore how the listener behaves in Active MQ and how accurately it would be able to report the health of the broker, then that would be much appreciated. If it continues to look like a viable option we can help in figuring out exactly how to auto-configure it.

filiphr · 2017-11-12T18:17:49Z

@wilkinsona I did a small quick test, registering the listener does nothing actually, i.e. it is never invoked. I have created AMQ-6860 to see what is the best way to monitor the failover transport. Will get back here once I hear some news there.

wilkinsona · 2017-11-12T18:36:10Z

@filiphr Thanks. I've started watching the AMQ issue that you opened.

ghost · 2017-11-14T16:21:54Z

Hello all,

I just struggled with the same issue, but Websphere MQ, not ActiveMQ.
It happens with all TCP connections blocked by a firewall, without a connection timeout set in the ConnectionFactory (some libraries don't support it).
I agree with @filiphr that health checks shouldn't freeze for minutes.
So there should be a generic solution, independent of the remote queue implementation.

Is there a reason the JmsHealthIndicator check uses the ConnectionFactory instead of JmsTemplate (like all other health checks)?

@philwebb Are you sure about the Java 6 support? RabbitHealthIndicator uses lambda expression.

snicoll · 2017-11-14T16:38:08Z

@ramato-procon JmsTemplate is not going to help, the timeout offered by the provider is related to message consumption (which is certainly not something you want to do as part of a health check).

As for what Phil said, it's related to backporting the change to 1.5.x, there is no lambda there.

ghost · 2017-11-14T16:51:55Z

@snicoll yes, you are right, it wouldn't help for the timeout issue. But it would make the indicators more uniform.
Ah sorry, I didn't read the question about backporting.

snicoll · 2017-11-14T17:00:29Z

it would make the indicators more uniform.

I am not following. The current indicator uses javax.jms.ConnectionFactory.

ghost · 2017-11-14T17:41:34Z

I wanted to make it work like RabbitHealthIndicator which uses RabbitTemplate .
So my intention was to use the JmsTemplate for the JmsHealthIndicator instead of directly calling the javax.jms.ConnectionFactory .
I thought maybe an exisiting connection or session can be reused.
But after reviewing the JmsTemplate code, it seems there is no way to just receive the meta data.

filiphr · 2017-11-14T21:55:41Z

I think that this is a major lack in the JMS specification.

I received feedback on my ActiveMQ question here.

For the Spring Boot maintainers, what do you think about reusing a connection during the health check? Something similar to what is proposed in the reply to my question.

Instead of trying to connect on each request we would have an atomic field that would store the previous future. Which means that if a new request comes and the future is not completed, we won't create a new Thread and will wail for the same one.

An alternative approach would be to use something special from ActiveMQ where we would explicitly check if a connection is started. However, this means that we would need to implement separate health checks for all the JMS connections that you support.

Additionally, I have created a repo with a minimal example that one can use to see the problem with ActiveMQ.

wilkinsona · 2017-11-15T16:50:41Z

Thanks for all your efforts on this @filiphr. I think we're in agreement that we need a solution other than the one proposed here. Let's continue the discussion on #10809 and see if we can come up with something.

filiphr · 2017-11-15T17:53:00Z

No problem @wilkinsona thank you all for the discussion and having a look into this. I hope we'll manage to find a good solution for this problem. Not that it is Spring Boot fault, but it is really annoying when you lose time because of such an issue.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 31, 2017

philwebb added the for: team-attention An issue we'd like other members of the team to review label Oct 31, 2017

philwebb added priority: normal type: bug A general bug and removed for: team-attention An issue we'd like other members of the team to review status: waiting-for-triage An issue we've not yet triaged labels Nov 15, 2017

philwebb added the for: merge-with-amendments Needs some changes when we merge label Nov 15, 2017

philwebb mentioned this pull request Nov 15, 2017

JmsHealthIndicator can hang in case failover with infinite max reconnects is used with ActiveMQ #10809

Closed

philwebb added status: declined A suggestion or change that we don't feel we should currently apply and removed for: merge-with-amendments Needs some changes when we merge priority: normal type: bug A general bug labels Nov 15, 2017

wilkinsona closed this Nov 15, 2017

filiphr deleted the jms-health-non-blocking branch December 2, 2018 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure that starting a JMS connection does not block infinitely #10853

Make sure that starting a JMS connection does not block infinitely #10853

filiphr commented Oct 31, 2017

filiphr commented Oct 31, 2017

philwebb commented Oct 31, 2017

filiphr commented Oct 31, 2017

snicoll commented Nov 1, 2017

filiphr commented Nov 1, 2017 •

edited

Loading

snicoll commented Nov 1, 2017

filiphr commented Nov 1, 2017

wilkinsona commented Nov 1, 2017

filiphr commented Nov 1, 2017 •

edited

Loading

wilkinsona commented Nov 1, 2017

filiphr commented Nov 4, 2017

filiphr commented Nov 6, 2017

wilkinsona commented Nov 6, 2017

filiphr commented Nov 6, 2017

wilkinsona commented Nov 11, 2017

filiphr commented Nov 12, 2017

wilkinsona commented Nov 12, 2017

ghost commented Nov 14, 2017 •

edited by ghost

Loading

snicoll commented Nov 14, 2017 •

edited

Loading

ghost commented Nov 14, 2017

snicoll commented Nov 14, 2017

ghost commented Nov 14, 2017

filiphr commented Nov 14, 2017

wilkinsona commented Nov 15, 2017

filiphr commented Nov 15, 2017

Make sure that starting a JMS connection does not block infinitely #10853

Make sure that starting a JMS connection does not block infinitely #10853

Conversation

filiphr commented Oct 31, 2017

filiphr commented Oct 31, 2017

philwebb commented Oct 31, 2017

filiphr commented Oct 31, 2017

snicoll commented Nov 1, 2017

filiphr commented Nov 1, 2017 • edited Loading

snicoll commented Nov 1, 2017

filiphr commented Nov 1, 2017

wilkinsona commented Nov 1, 2017

filiphr commented Nov 1, 2017 • edited Loading

wilkinsona commented Nov 1, 2017

filiphr commented Nov 4, 2017

filiphr commented Nov 6, 2017

wilkinsona commented Nov 6, 2017

filiphr commented Nov 6, 2017

wilkinsona commented Nov 11, 2017

filiphr commented Nov 12, 2017

wilkinsona commented Nov 12, 2017

ghost commented Nov 14, 2017 • edited by ghost Loading

snicoll commented Nov 14, 2017 • edited Loading

ghost commented Nov 14, 2017

snicoll commented Nov 14, 2017

ghost commented Nov 14, 2017

filiphr commented Nov 14, 2017

wilkinsona commented Nov 15, 2017

filiphr commented Nov 15, 2017

filiphr commented Nov 1, 2017 •

edited

Loading

filiphr commented Nov 1, 2017 •

edited

Loading

ghost commented Nov 14, 2017 •

edited by ghost

Loading

snicoll commented Nov 14, 2017 •

edited

Loading