Skip to content

[Bug]: bind() to unix:/var/lib/nginx/nginx-status.sock failed (98: Address already in use) #6752

Closed
1 of 1 issue completed
@granescb

Description

@granescb

Version

3.6.2

What Kubernetes platforms are you running on?

EKS Amazon

Steps to reproduce

k8s EKS version: 1.31

Describe the bug:
Sometimes, the nginx-ingress-controller restarts the process without cleaning the socket files.
At first time we meet this problem during massive node restarting in the k8s cluster.
Then it happens randomly on weekends.

The problem was noticed in version 3.6.2. Before we used app version 3.0.2 and never had this problem

Manual Pod deletion solves the problem, but it can happen again.

Here is deployment yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    meta.helm.sh/release-name: nginx-inc-ingress-controller
    meta.helm.sh/release-namespace: nginx-ingress
  labels:
    app: nginx-inc-ingress-controller
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/version: 3.6.1
    helm.sh/chart: nginx-ingress-1.3.1
  name: nginx-inc-ingress-controller
  namespace: nginx-ingress
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx-inc-ingress-controller
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        logs.improvado.io/app: nginx-ingress
        logs.improvado.io/format: json
        logs.improvado.io/ingress-class: nginx-stable
        prometheus.io/port: "9113"
        prometheus.io/scheme: http
        prometheus.io/scrape: "true"
      creationTimestamp: null
      labels:
        app: nginx-inc-ingress-controller
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: nginx-inc-ingress-controller
            topologyKey: kubernetes.io/hostname
      automountServiceAccountToken: true
      containers:
      - args:
        - -nginx-plus=false
        - -nginx-reload-timeout=60000
        - -enable-app-protect=false
        - -enable-app-protect-dos=false
        - -nginx-configmaps=$(POD_NAMESPACE)/nginx-inc-ingress-controller
        - -default-server-tls-secret=$(POD_NAMESPACE)/nginx-inc-ingress-controller-default-server-tls
        - -ingress-class=nginx-stable
        - -health-status=true
        - -health-status-uri=/-/health/lb
        - -nginx-debug=false
        - -v=1
        - -nginx-status=true
        - -nginx-status-port=8080
        - -nginx-status-allow-cidrs=127.0.0.1
        - -report-ingress-status
        - -enable-leader-election=true
        - -leader-election-lock-name=nginx-inc-ingress-controller-leader
        - -enable-prometheus-metrics=true
        - -prometheus-metrics-listen-port=9113
        - -prometheus-tls-secret=
        - -enable-service-insight=false
        - -service-insight-listen-port=9114
        - -service-insight-tls-secret=
        - -enable-custom-resources=true
        - -enable-snippets=true
        - -include-year=false
        - -disable-ipv6=false
        - -enable-tls-passthrough=false
        - -enable-cert-manager=false
        - -enable-oidc=false
        - -enable-external-dns=false
        - -default-http-listener-port=80
        - -default-https-listener-port=443
        - -ready-status=true
        - -ready-status-port=8081
        - -enable-latency-metrics=false
        - -ssl-dynamic-reload=true
        - -enable-telemetry-reporting=false
        - -weight-changes-dynamic-reload=false
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: 627003544259.dkr.ecr.us-east-1.amazonaws.com/nginx-inc-ingress:master-3.6.2-2-1
        imagePullPolicy: IfNotPresent
        name: ingress-controller
        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
        - containerPort: 9113
          name: prometheus
          protocol: TCP
        - containerPort: 8081
          name: readiness-port
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /nginx-ready
            port: readiness-port
            scheme: HTTP
          periodSeconds: 1
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 1500m
            memory: 1500Mi
          requests:
            cpu: 100m
            memory: 1500Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 101
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/nginx
          name: nginx-etc
        - mountPath: /var/cache/nginx
          name: nginx-cache
        - mountPath: /var/lib/nginx
          name: nginx-lib
        - mountPath: /var/log/nginx
          name: nginx-log
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - cp
        - -vdR
        - /etc/nginx/.
        - /mnt/etc
        image: 627003544259.dkr.ecr.us-east-1.amazonaws.com/nginx-inc-ingress:master-3.6.2-2-1
        imagePullPolicy: IfNotPresent
        name: init-ingress-controller
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 101
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /mnt/etc
          name: nginx-etc
      nodeSelector:
        kubernetes.io/arch: amd64
      priorityClassName: cluster-application-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: nginx-inc-ingress-controller
      serviceAccountName: nginx-inc-ingress-controller
      terminationGracePeriodSeconds: 60
      tolerations:
      - effect: NoExecute
        key: node.kubernetes.io/not-ready
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: node.kubernetes.io/unreachable
        operator: Exists
        tolerationSeconds: 300
      - effect: NoSchedule
        key: node.kubernetes.io/memory-pressure
        operator: Exists
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: nginx-inc-ingress-controller
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      volumes:
      - emptyDir: {}
        name: nginx-etc
      - emptyDir: {}
        name: nginx-cache
      - emptyDir: {}
        name: nginx-lib
      - emptyDir: {}
        name: nginx-log

Logs with error:

2024-11-02 10:02:58.543	2024/11/02 06:02:56 [emerg] 18#18: bind() to unix:/var/lib/nginx/nginx-status.sock failed (98: Address already in use)
2024-11-02 10:02:58.543	2024/11/02 06:02:56 [emerg] 18#18: bind() to unix:/var/lib/nginx/nginx-config-version.sock failed (98: Address already in use)
2024-11-02 10:02:58.543	2024/11/02 06:02:56 [emerg] 18#18: bind() to unix:/var/lib/nginx/nginx-502-server.sock failed (98: Address already in use)
2024-11-02 10:02:58.543	2024/11/02 06:02:56 [emerg] 18#18: bind() to unix:/var/lib/nginx/nginx-418-server.sock failed (98: Address already in use)
2024-11-02 10:02:58.543	2024/11/02 06:02:56 [notice] 18#18: try again to bind() after 500ms
2024-11-02 10:02:59.043	2024/11/02 06:02:56 [emerg] 18#18: still could not bind()

Here are logs, containing 1 signal reconfiguring and then a crash loop with socket busy error
Explore-logs-2024-11-05 18_40_57.txt

Expected behavior
nginx-ingress controller pod is working.

Sub-issues

Metadata

Metadata

Assignees

Labels

backlogPull requests/issues that are backlog itemsbugAn issue reporting a potential bug

Type

Projects

Status

Done 🚀

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions