-
Notifications
You must be signed in to change notification settings - Fork 41.2k
Make sure that starting a JMS connection does not block infinitely #10853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Starting a JMS connection may block infinitely if the factory is not correctly setup. For example if ActiveMQ with failover transport is used then the default connection retries infinitely, which leads to ActiveMQ trying to establish a connection and never succedding in case none of the configured brokers is up. Fixes spring-projectsgh-10809
One more thing, I did it against master. However, I think the same could be done for earlier versions. Will it be OK if we do it for 1.5.x? I was not sure what our stance on such things is |
@filiphr It feels like quite a large change for 1.5.x. We're also limited in that we need to support Java 6 so |
Didn't know you were all the way to 6 there. Maybe I can use a Future, or a Thread. In any case the change would be more or less similar. We need to run the start connection in its own Thread so we can block. In case you are set only for 2.0 we can discuss how to proceed with it there only. For me important is, whether such an approach is acceptable |
@filiphr that's a large change for something that's arguably a configuration mistake. I wonder if we couldn't change our ActiveMQ default not to be infinite or ask the ActiveMQ team if they could reconsider the change. |
@snicoll I think there ia nothing you can change on your side. The spring boot defaults are ok. This happens when one actively uses failover transport without setting anything (just addresses). I can submit an issue for them. However, other vendors might be doing something like that as well. Also keep in mind that if one does such configuration mistake the entire health endpoint is blocked. At first I didn't know why the endpoint was not working. I noticed that it blocks only because I was trying something with spring boot admin and they have a timeout on the request, which means that they report that the service is offline. I am open to other ideas about improving this as well. In the end it is completely up to you whether you want to have something like this. The elasticsearch endpoint has something similar, although they support it in their API. |
@filiphr the reactive side of things has such support already. I would make it generally purpose rather than hardcoding something for JMS is what I am trying to say. |
@snicoll ok then. I didn't understand that sorry. Are there special reactive endpoints? This would also mean that you can invoke multiple endpoints in the same time. That is quite cool. Is there something that I can look into? |
Unfortunately, I don't think this approach, or an approach like it, is going to fix the problem. With this change in place, it looks to me like repeated calls to the health endpoint will now risk exhausting the resources of the common fork-join pool. I believe this applies to any solution that moves the call to start the connection onto another thread as the work being performed by that thread will never complete so the thread will never die or will never be returned to its pool.
@filiphr Please do raise an ActiveMQ issue. I can't think of a scenario where it's desirable for starting a connector to block forever and, as explained above, I don't think there's a robust way for us to work around that behaviour. |
@wilkinsona I'll create one. As for the Thread nit dieing. Won't closing the connection make the connection start finish? Can't we kill the Thread somehow? Also isn't the problem with the Thread never dieing also valid for the request Thread as well? As repeates calls to the endpoint will block always. How will that Thread be closed? |
I don't know. That would require something on another thread to close the connection, and would also require the JMS provider's implementation of close and start to behave as we'd like.
No, there's no safe way to kill or stop a thread.
Yes, it is. There's definitely a problem here, but the proposed solution is just moving the problem to another thread rather than solving it. |
@wilkinsona, @philwebb and @snicoll thanks a lot for the discussion. I completely understand all your points and I agree such a solution is not good. I went through the JIRA of ActiveMQ and it seems that this blocking occurs only when the broker is not available on startup (see AMQ-5588 and AMQ-2114). This actually is the expected behaviour of ActiveMQ and reading through the issues it seems as it makes sense. Unfortunately this means that maybe there is nothing that could be done on the spring boot side. I will test the reverse scenario, broker is up on startup, but goes down later on, to see what the health endpoint will do (whether it'll block or not, and whether it would be down or not). If you have some ideas I am more than happy to try them out and test them out on my setup. |
I just checked and blocking occurs even after the first initial connection is OK. This means that no matter what happens the health endpoint will block if the default failover protocol with infinite connection retries is used. |
Thanks for doing some further research, @filiphr. Perhaps we need to do something different for ActiveMQ: http://activemq.apache.org/how-can-i-monitor-the-connection-with-the-broker.html? |
No problem @wilkinsona. I really wanted to solve this as it can be a big pain. Actually, what you linked is pretty good, I used something similar for a STOMP endpoint. The only thing which I am not quite sure is how to do it. I was thinking on implementing an |
@filiphr I'm not entirely show how we'd do it, but customising the connection factory to register a |
@wilkinsona I did a small quick test, registering the listener does nothing actually, i.e. it is never invoked. I have created AMQ-6860 to see what is the best way to monitor the failover transport. Will get back here once I hear some news there. |
@filiphr Thanks. I've started watching the AMQ issue that you opened. |
Hello all, I just struggled with the same issue, but Websphere MQ, not ActiveMQ. Is there a reason the @philwebb Are you sure about the Java 6 support? |
@ramato-procon As for what Phil said, it's related to backporting the change to |
@snicoll yes, you are right, it wouldn't help for the timeout issue. But it would make the indicators more uniform. |
I am not following. The current indicator uses |
I wanted to make it work like |
I think that this is a major lack in the JMS specification. I received feedback on my ActiveMQ question here. For the Spring Boot maintainers, what do you think about reusing a connection during the health check? Something similar to what is proposed in the reply to my question. Instead of trying to connect on each request we would have an atomic field that would store the previous future. Which means that if a new request comes and the future is not completed, we won't create a new An alternative approach would be to use something special from ActiveMQ where we would explicitly check if a connection is started. However, this means that we would need to implement separate health checks for all the JMS connections that you support. Additionally, I have created a repo with a minimal example that one can use to see the problem with ActiveMQ. |
No problem @wilkinsona thank you all for the discussion and having a look into this. I hope we'll manage to find a good solution for this problem. Not that it is Spring Boot fault, but it is really annoying when you lose time because of such an issue. |
Starting a JMS connection may block infinitely if the factory is not correctly setup.
For example if ActiveMQ with failover transport is used then the default connection
retries infinitely, which leads to ActiveMQ trying to establish a connection and
never succedding in case none of the configured brokers is up.
Fixes gh-10809
This is an initial idea to show how I would solve the blocking call. Maybe there are better ideas. I would also add something like the
ElasticsearchHealthIndicatorProperties
so users can actually specify the timeout. Using a default could be like the one from the Elasticsearch (or bigger). I was also not completely sure about the status. Should it be down, unknown? I am not sure that we can say with 100% certainty that it is down and the reason.