Support a "drain" mode for terminating pods and servers failing readiness checks#95
Conversation
…ness checks When a pod starts terminating or the readiness check fails, leave the pod as a service in the load balancer, but set its weight to 0 so that HAProxy does not send ordinary traffic to the pod. Traffic using persistence, however, will be directed to the terminating pods. This allows persistent requests to continue to flow to a terminating pod. Updates to only the draining state are made using the stats socket rather than forcing a full reload of HAProxy.
|
A very specific usecase but indeed an interesting feature. I'll need a few more days to have a look at the approach. @aiharos fyi. |
|
The stats_socket.go changes seem solid, and if draining is enabled the server weight is set in the configuration template so that seems good too. +1 for ignoring Ingress status changes, that also causes some needless reloads. |
|
Hi, finally, thanks for supporting this ingress controller with this amazing piece of code! Sorry about the long delay, I was on vacation in the last weeks. Now you are adding |
|
Hi, only two pending issues after a few tests:
Moreover I pushed by mistake a commit that conflict with your change. Please have a look if you can rebase on top of master. |
|
Thanks for the feedback. I’ve been sidetracked on some other projects and am on vacation now, but will try to get this fixed up next week.
… On Mar 20, 2018, at 7:24 PM, Joao Morais ***@***.***> wrote:
Hi, only two pending issues after a few tests:
check if svc namespace and pod namespace are the same at GetTerminatingServicePods
only add draining pods to the backend list if draining is enabled - on our usecase we need terminating pods being removed from the balance asap
Moreover I pushed by mistake a commit that conflict with your change. Please have a look if you can rebase on top of master.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
* validate namespace match when pulling terminating service pods * don't include not ready or terminating pods when DrainSupport is not active
|
Thanks! |
|
@brianloss Kudos for developing this-- we need it too, and we're planning on trying it out. Just wanted to say thanks and let you know someone else indeed used it! |
|
No problem! Hope it works ok for you -- I've tested for my use case, but go isn't my primary language and I'm sure there are some parts I could have done better. |
|
I'm hoping to test today-- i'll let you you know testing goes. My scenario is a java app ( JSESSIONID) using sticky sessions to a 3- node deployment. My goal is to execute a rolling update of the deployment and have the nodes finish terminating when they have no more active sessions. To pull this off, I also need to fail the health check when there are no more active sessions, which in java is a JMX thing. You're not using java by any chance are you? If so, how did you loop active session count into the health check? |
|
I'm using Wildfly. Typically what would happen is you would initiate a shutdown of your app, and it would not accept new connections. However, existing connections (e.g., active sessions) will be left alone by both HAproxy and Kubernetes, so they could finish and then there would be no need for this feature (since terminating the pod will cause it to be removed from the endpoint as soon as termination begins and therefore no new requests will go to it while active connections are finishing up). For my use case, there is a stateful API where users access a query by executing a create call to the server, followed by a number of next calls to retrieve pages of results. Since these are new requests that for us must go back to the same web server, I need to support the drain mode. By setting the weight of a server to 0 as soon as it is terminating, HAproxy won't send any new traffic to it. Only traffic that uses cookie persistence will go there. |
|
Yep, that's the same as our use case, almost exactly. How did you organize your health check so that it knows when its ok to stop the server? |
|
For us, the health check isn't really involved. As I mentioned above, the weight is set to 0, so HAProxy won't send any non-persistent traffic to the terminating pod. The rest is built into the docker image and service. When a pod is terminated, docker sends a SIGTERM which our entrypoint script traps and uses to invoke a curl call on our API. That call watches internally until either all active queries complete or a long timeout expires, and then tells Wildfly to shut down. |
|
ok thanks I understand now! |
When a pod starts terminating or the readiness check fails, leave the pod as a service
in the load balancer, but set its weight to 0 so that HAProxy does not send ordinary
traffic to the pod. Traffic using persistence, however, will be directed to the terminating
pods. This allows persistent requests to continue to flow to a terminating pod. Updates to
only the draining state are made using the stats socket rather than forcing a full reload
of HAProxy.
This PR includes the described change, as well as a few fixes for typos I found along the way, and clearing up some warnings IntelliJ was showing (e.g., initializing slices with empty literal instead of nil, error messages starting with a capital letter).
I have a use case for pods having a long termination period and new requests with a cookie need to be directed to terminating servers during this period. I was thinking this could potentially be useful for other people too, but can always keep in a fork if you don't think it fits.