Pod Termination handling kicks in before the ingress controller has had time to process

### What happened?

When a pod is entering its Terminating state, it will receive a signal, asking it kindly to finish up work after which kubernetes will proceed deleting the pod. 

At the same time that the pod starts terminating, an ingress controller will receive the updated endpoints object, which will start removing the pod from the list of targets in the load balancer, that traffic could be sent to.

Both of these processes - the signal handling at the kubelet level and the removal of the Pods IP from the list of endpoints - are decoupled from one another and the SIGTERM might have been handled before, or at the same time, that the target in the target group is being processed.

As result the ingress controller might still send traffic to targets, which are still in its endpoints, but have properly shut down already. This might result in dropped connections, as the LB is still trying to send requests to the properly shutdown pod. The LB will in-turn reply with 5xx responses.



### What did you expect to happen?

no traffic being dropped during shutdown.

The SIGTERM should only start after the ingress controller/LB has removed the target from the target group. Readiness gates work pretty good for pod startup/rollout but lack support during pod deletion.



### How can we reproduce it (as minimally and precisely as possible)?

This is a very theoretical problem, which is very hard to reproduce:

- Provision an ingress controller (AWS LB for example)
- Create an ingress
- Create a service and pods (multiple ones through a deployment work best) for this ingress
- (add some delay/load to the cluster, that will cause the LB synchronization to be slower or delayed)
- startup an HTTP benchmark to produce some artificial load
- rollout a change to the deployment or just evict some pods


### Anything else we need to know?

We've been relying on [Pod-Graceful-Drain](https://github.com/foriequal0/pod-graceful-drain), which unfortunately intercepts and breaks k8s internals.

You can achieve a pretty good result as well using a `sleep` as `preStop`, but that's not reliable at all - due to the fact that it's just a guessing game if your traffic will be drained after X seconds - and requires statically linked binaries to be mounted in each container or the existence of sleep in the operating system.

I also opened up an issue on the [Ingress Controllers repo](https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2366). 

### Kubernetes version

<details>

```console
$ kubectl version
v1.18.20
```

</details>


### Cloud provider

<details>
AWS/EKS
</details>


### OS version

<details>

```console
# On Linux:
sh-4.2$ cat /etc/os-release
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

# paste output here
$ uname -a
Linux xxx 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
```

</details>


### Install tools

<details>
- [https://github.com/kubernetes-sigs/aws-load-balancer-controller](https://github.com/kubernetes-sigs/aws-load-balancer-controller)
</details>


### Container runtime (CRI) and and version (if applicable)

<details>
Docker version 20.10.7, build f0df350
</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pod Termination handling kicks in before the ingress controller has had time to process #106476

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pod Termination handling kicks in before the ingress controller has had time to process #106476

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions