-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Describe the bug
When a pod is Terminating it will receive a SIGTERM connection asking it to finish up work and after that it will proceed with deleting the pod. At the same time that the pod starts terminating, the aws-load-balancer-controller will receive the updated object, forcing it to start removing the pod from the target group and to initialize draining.
Both of these processes - the signal handling at the kubelet level and the removal of the Pods IP from the TG - are decoupled from one another and the SIGTERM might have been handled before or at the same time, that the target in the target group has started draining.
As result the pod might be unavailable before the target group has even started its own draining process. This might result in dropped connections, as the LB is still trying to send requests to the properly shutdown pod. The LB will in-turn reply with 5xx responses.
Steps to reproduce
- Provision an ingress with an ALB attached
alb.ingress.kubernetes.io/certificate-arn: xxx alb.ingress.kubernetes.io/healthcheck-interval-seconds: "10" alb.ingress.kubernetes.io/healthcheck-path: / alb.ingress.kubernetes.io/healthcheck-timeout-seconds: "5" alb.ingress.kubernetes.io/healthy-threshold-count: "2" alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]' alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=30 alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/security-groups: xxx alb.ingress.kubernetes.io/tags: xxx alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=60 alb.ingress.kubernetes.io/target-type: ip alb.ingress.kubernetes.io/unhealthy-threshold-count: "2"
- Create a service and pods (multiple ones through a deployment work best) for this ingress
- (add some delay/load to the cluster, that will cause the AWS requests to be slower or have to be retried)
- startup an HTTP benchmark to produce some artificial load
- rollout a change to the deployment or just evict some pods
Expected outcome
- The SIGTERM should only start after the ALB has either removed the target from the Target Group or at least doesn't send any new traffic to the pod after it received the SIGTERM signal
Environment
- an HTTP service
All our ingresses have
- AWS Load Balancer controller version
v2.2.4
- Kubernetes version
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.20-eks-8c579e", GitCommit:"8c579edfc914f013ff48b2a2b2c1308fdcacc53f", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:13Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
- Using EKS (yes), if so version ? 1.18 eks.8
Additional Context:
We've been relying on Pod-Graceful-Drain, which unfortunately forks this controller and intercepts and breaks k8s controller internals.
You can achieve a pretty good result as well using a sleep
as preStop
, but that's not reliable at all - due to the fact that it's just a guessing game if your traffic will be drained after X seconds - and requires statically linked binaries to be mounted in each container or the existence of sleep in the operating system.
I believe this is not only an issue to this controller, but to k8s in general. So any hints and already existing tickets would be very welcome.