Karpenter Nodes Unable to Consolidate

### Version


__Karpenter Version:__ v0.26.0

__Kubernetes Version:__ v1.23.17


### Expected Behavior

- Launch node and utilize it fully 
- If unscheduled pods are more, schedule it on new node but later consolidate them (this is working on staging because HPA and traffic is not much, but in production its running 100+ pods on each node but later unable to consolidate and nodes are underutilized)

### Actual Behavior

Issue is as following:

We have recently migrated from CA to Karpenter. We have custom provisioner and awsnodetemplate for workloads. Each nodes are expected to run max 500 pods (we have this enabled on karpenter), since we have HPA enabled and lot of traffic coming in we are seeing that karpenter launches multiple nodes and schedules 100+ pods on each node and now its unable to consolidate. Earlier we were running 3 nodes now after migrating to karpenter we are running 7 nodes on average. We have following events

```
│ Events:                                                                                                                                                                                                                                                                      │
│   Type    Reason            Age                    From       Message                                                                                                                                                                                                        │
│   ----    ------            ----                   ----       -------                                                                                                                                                                                                        │
│   Normal  Unconsolidatable  31m (x7 over 2d20h)    karpenter  not all pods would schedule                                                                                                                                                                                    │
│   Normal  Unconsolidatable  68s (x351 over 3d18h)  karpenter  can't remove without creating 3 nodes
```

### Steps to Reproduce the Problem

- Setup HPA with deployment
- Provisioner should container 4xlarge instances, and kubelet configuration of max 500 pods
- Launch 1000+ pods and check instances being provisioned
- once everything is settled, nodes which are running 100+ pods are not able to consolidate, check events.

### Resource Specs and Logs

Provisioner Configs

```
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: worker-provisioner
spec:
  consolidation:
    enabled: true
  kubeletConfiguration:
    maxPods: 500
  labels:
    node.kubernetes.io/role: worker
    nodegroup: worker
  limits:
    resources:
      cpu: "400"
      memory: 2800Gi
  providerRef:
    name: worker-awsnodetemplate
  requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - spot
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - r4.4xlarge
    - r5.4xlarge
    - r5a.4xlarge
    - m6i.4xlarge
    - r5.4xlarge
    - r5a.4xlarge
    - r5ad.4xlarge
    - r5b.4xlarge
    - r5d.4xlarge
    - r5dn.4xlarge
    - r5n.4xlarge
    - r6i.4xlarge
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: kubernetes.io/arch
    operator: In
    values:
    - amd64
  taints:
  - effect: NoSchedule
    key: node.kubernetes.io/role
    value: worker
  ttlSecondsUntilExpired: 2592000
```

AWSNodetemplate Configs

```
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
  name: worker-awsnodetemplate
spec:
  amiFamily: AL2
  blockDeviceMappings:
  - deviceName: /dev/xvda
    ebs:
      deleteOnTermination: true
      volumeSize: 200Gi
      volumeType: gp3
  instanceProfile: eks-nodes-production-worker
  metadataOptions:
    httpTokens: optional
  securityGroupSelector:
    karpenter.sh/discovery: eks-production
  subnetSelector:
    karpenter.sh/discovery: eks-production
  tags:
    Environment: production
    Name: production-worker-karpenter-eks-node
  userData: |
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="BOUNDARY"

    --BOUNDARY
    Content-Type: text/x-shellscript; charset="us-ascii"

    !/bin/bash -xe
    # Mount ephmeral volume if exists
    if [[ -e /dev/nvme1n1 ]] && [[ ! $(grep /dev/nvme1n1 /proc/mounts) ]]; then
      mkfs.xfs /dev/nvme1n1
      mount /dev/nvme1n1 /var/lib/docker
      systemctl restart docker
    fi

    # Custom supplied userdata code to install ssm-agent
    yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
    systemctl enable amazon-ssm-agent
    systemctl start amazon-ssm-agent

    # Custom userdata to increase socket max connections
    echo "net.core.somaxconn=4096" >> /etc/sysctl.conf
    sysctl -p
    --BOUNDARY--

```



### Community Note

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karpenter Nodes Unable to Consolidate #3927

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

Community Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Karpenter Nodes Unable to Consolidate #3927

Description

Version

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Resource Specs and Logs

Community Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions