Skip to content

Make sure that starting a JMS connection does not block infinitely #10853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

filiphr
Copy link
Contributor

@filiphr filiphr commented Oct 31, 2017

Starting a JMS connection may block infinitely if the factory is not correctly setup.
For example if ActiveMQ with failover transport is used then the default connection
retries infinitely, which leads to ActiveMQ trying to establish a connection and
never succedding in case none of the configured brokers is up.

Fixes gh-10809

This is an initial idea to show how I would solve the blocking call. Maybe there are better ideas. I would also add something like the ElasticsearchHealthIndicatorProperties so users can actually specify the timeout. Using a default could be like the one from the Elasticsearch (or bigger). I was also not completely sure about the status. Should it be down, unknown? I am not sure that we can say with 100% certainty that it is down and the reason.

Starting a JMS connection may block infinitely if the factory is not correctly setup.
For example if ActiveMQ with failover transport is used then the default connection
retries infinitely, which leads to ActiveMQ trying to establish a connection and
never succedding in case none of the configured brokers is up.

Fixes spring-projectsgh-10809
@filiphr
Copy link
Contributor Author

filiphr commented Oct 31, 2017

One more thing, I did it against master. However, I think the same could be done for earlier versions. Will it be OK if we do it for 1.5.x? I was not sure what our stance on such things is

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Oct 31, 2017
@philwebb
Copy link
Member

@filiphr It feels like quite a large change for 1.5.x. We're also limited in that we need to support Java 6 so CompletableFuture isn't an option.

@filiphr
Copy link
Contributor Author

filiphr commented Oct 31, 2017

Didn't know you were all the way to 6 there. Maybe I can use a Future, or a Thread. In any case the change would be more or less similar. We need to run the start connection in its own Thread so we can block.

In case you are set only for 2.0 we can discuss how to proceed with it there only. For me important is, whether such an approach is acceptable

@philwebb philwebb added the for: team-attention An issue we'd like other members of the team to review label Oct 31, 2017
@snicoll
Copy link
Member

snicoll commented Nov 1, 2017

@filiphr that's a large change for something that's arguably a configuration mistake. I wonder if we couldn't change our ActiveMQ default not to be infinite or ask the ActiveMQ team if they could reconsider the change.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 1, 2017

@snicoll I think there ia nothing you can change on your side. The spring boot defaults are ok. This happens when one actively uses failover transport without setting anything (just addresses). I can submit an issue for them. However, other vendors might be doing something like that as well.

Also keep in mind that if one does such configuration mistake the entire health endpoint is blocked. At first I didn't know why the endpoint was not working. I noticed that it blocks only because I was trying something with spring boot admin and they have a timeout on the request, which means that they report that the service is offline.

I am open to other ideas about improving this as well.

In the end it is completely up to you whether you want to have something like this. The elasticsearch endpoint has something similar, although they support it in their API.

@snicoll
Copy link
Member

snicoll commented Nov 1, 2017

@filiphr the reactive side of things has such support already. I would make it generally purpose rather than hardcoding something for JMS is what I am trying to say.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 1, 2017

@snicoll ok then. I didn't understand that sorry.

Are there special reactive endpoints? This would also mean that you can invoke multiple endpoints in the same time. That is quite cool. Is there something that I can look into?

@wilkinsona
Copy link
Member

Unfortunately, I don't think this approach, or an approach like it, is going to fix the problem. With this change in place, it looks to me like repeated calls to the health endpoint will now risk exhausting the resources of the common fork-join pool. I believe this applies to any solution that moves the call to start the connection onto another thread as the work being performed by that thread will never complete so the thread will never die or will never be returned to its pool.

I can submit an issue for them

@filiphr Please do raise an ActiveMQ issue. I can't think of a scenario where it's desirable for starting a connector to block forever and, as explained above, I don't think there's a robust way for us to work around that behaviour.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 1, 2017

@wilkinsona I'll create one.

As for the Thread nit dieing. Won't closing the connection make the connection start finish? Can't we kill the Thread somehow?

Also isn't the problem with the Thread never dieing also valid for the request Thread as well? As repeates calls to the endpoint will block always. How will that Thread be closed?

@wilkinsona
Copy link
Member

Won't closing the connection make the connection start finish?

I don't know. That would require something on another thread to close the connection, and would also require the JMS provider's implementation of close and start to behave as we'd like.

Can't we kill the Thread somehow?

No, there's no safe way to kill or stop a thread.

Also isn't the problem with the Thread never dieing also valid for the request Thread as well?

Yes, it is. There's definitely a problem here, but the proposed solution is just moving the problem to another thread rather than solving it.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 4, 2017

@wilkinsona, @philwebb and @snicoll thanks a lot for the discussion. I completely understand all your points and I agree such a solution is not good.

I went through the JIRA of ActiveMQ and it seems that this blocking occurs only when the broker is not available on startup (see AMQ-5588 and AMQ-2114). This actually is the expected behaviour of ActiveMQ and reading through the issues it seems as it makes sense. Unfortunately this means that maybe there is nothing that could be done on the spring boot side.

I will test the reverse scenario, broker is up on startup, but goes down later on, to see what the health endpoint will do (whether it'll block or not, and whether it would be down or not).

If you have some ideas I am more than happy to try them out and test them out on my setup.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 6, 2017

I just checked and blocking occurs even after the first initial connection is OK. This means that no matter what happens the health endpoint will block if the default failover protocol with infinite connection retries is used.

@wilkinsona
Copy link
Member

Thanks for doing some further research, @filiphr. Perhaps we need to do something different for ActiveMQ: http://activemq.apache.org/how-can-i-monitor-the-connection-with-the-broker.html?

@filiphr
Copy link
Contributor Author

filiphr commented Nov 6, 2017

No problem @wilkinsona. I really wanted to solve this as it can be a big pain. Actually, what you linked is pretty good, I used something similar for a STOMP endpoint. The only thing which I am not quite sure is how to do it.

I was thinking on implementing an ActiveMQConnectionFactoryCustomizer where we would be able to set the TransportListener. The transport listener itself can be the health endpoint and we would have a state that we would monitor, based on which we would generate the endpoint. This can mean that the state might be unknown until an initial connection is started form the outside. I presume we would need another JmsHealthIndicatorAutoConfiguration that would be only for ActiveMQConnectionFactory

@wilkinsona
Copy link
Member

@filiphr I'm not entirely show how we'd do it, but customising the connection factory to register a TransportListener is the sort of thing I had in mind. If you have some time to explore how the listener behaves in Active MQ and how accurately it would be able to report the health of the broker, then that would be much appreciated. If it continues to look like a viable option we can help in figuring out exactly how to auto-configure it.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 12, 2017

@wilkinsona I did a small quick test, registering the listener does nothing actually, i.e. it is never invoked. I have created AMQ-6860 to see what is the best way to monitor the failover transport. Will get back here once I hear some news there.

@wilkinsona
Copy link
Member

@filiphr Thanks. I've started watching the AMQ issue that you opened.

@ghost
Copy link

ghost commented Nov 14, 2017

Hello all,

I just struggled with the same issue, but Websphere MQ, not ActiveMQ.
It happens with all TCP connections blocked by a firewall, without a connection timeout set in the ConnectionFactory (some libraries don't support it).
I agree with @filiphr that health checks shouldn't freeze for minutes.
So there should be a generic solution, independent of the remote queue implementation.

Is there a reason the JmsHealthIndicator check uses the ConnectionFactory instead of JmsTemplate (like all other health checks)?

@philwebb Are you sure about the Java 6 support? RabbitHealthIndicator uses lambda expression.

@snicoll
Copy link
Member

snicoll commented Nov 14, 2017

@ramato-procon JmsTemplate is not going to help, the timeout offered by the provider is related to message consumption (which is certainly not something you want to do as part of a health check).

As for what Phil said, it's related to backporting the change to 1.5.x, there is no lambda there.

@ghost
Copy link

ghost commented Nov 14, 2017

@snicoll yes, you are right, it wouldn't help for the timeout issue. But it would make the indicators more uniform.
Ah sorry, I didn't read the question about backporting.

@snicoll
Copy link
Member

snicoll commented Nov 14, 2017

it would make the indicators more uniform.

I am not following. The current indicator uses javax.jms.ConnectionFactory.

@ghost
Copy link

ghost commented Nov 14, 2017

I wanted to make it work like RabbitHealthIndicator which uses RabbitTemplate .
So my intention was to use the JmsTemplate for the JmsHealthIndicator instead of directly calling the javax.jms.ConnectionFactory .
I thought maybe an exisiting connection or session can be reused.
But after reviewing the JmsTemplate code, it seems there is no way to just receive the meta data.

@filiphr
Copy link
Contributor Author

filiphr commented Nov 14, 2017

I think that this is a major lack in the JMS specification.

I received feedback on my ActiveMQ question here.

For the Spring Boot maintainers, what do you think about reusing a connection during the health check? Something similar to what is proposed in the reply to my question.

Instead of trying to connect on each request we would have an atomic field that would store the previous future. Which means that if a new request comes and the future is not completed, we won't create a new Thread and will wail for the same one.

An alternative approach would be to use something special from ActiveMQ where we would explicitly check if a connection is started. However, this means that we would need to implement separate health checks for all the JMS connections that you support.

Additionally, I have created a repo with a minimal example that one can use to see the problem with ActiveMQ.

@philwebb philwebb added priority: normal type: bug A general bug and removed for: team-attention An issue we'd like other members of the team to review status: waiting-for-triage An issue we've not yet triaged labels Nov 15, 2017
@philwebb philwebb added the for: merge-with-amendments Needs some changes when we merge label Nov 15, 2017
@philwebb philwebb added status: declined A suggestion or change that we don't feel we should currently apply and removed for: merge-with-amendments Needs some changes when we merge priority: normal type: bug A general bug labels Nov 15, 2017
@wilkinsona
Copy link
Member

Thanks for all your efforts on this @filiphr. I think we're in agreement that we need a solution other than the one proposed here. Let's continue the discussion on #10809 and see if we can come up with something.

@wilkinsona wilkinsona closed this Nov 15, 2017
@filiphr
Copy link
Contributor Author

filiphr commented Nov 15, 2017

No problem @wilkinsona thank you all for the discussion and having a look into this. I hope we'll manage to find a good solution for this problem. Not that it is Spring Boot fault, but it is really annoying when you lose time because of such an issue.

@filiphr filiphr deleted the jms-health-non-blocking branch December 2, 2018 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JmsHealthIndicator can hang in case failover with infinite max reconnects is used with ActiveMQ
5 participants