Skip to content

Intermittent hang at clickhouse extract-from-config on NFS-backed storage (v25.8) #1857

@pinkyjpainadath

Description

@pinkyjpainadath

Description:
We are experiencing an intermittent issue where ClickHouse pods get stuck during startup at the extract-from-config phase when using NFS-mounted storage. The issue resolves itself after multiple restarts, but it blocks pod readiness meanwhile.

Environment:

ClickHouse version: 25.8.9.20 (official build)
Docker image: clickhouse/clickhouse-server:25.8
Kubernetes: v1.28.x
NFS storage: v3, options: rw,noatime,sync,vers=3,rsize=8192,wsize=8192,hard,proto=tcp,timeo=600,retrans=2,sec=sys
CPU / Memory: 6 cores / 16Gi
CI/CD: ArgoCD deployment

Symptoms:
Pod starts, but clickhouse extract-from-config hangs indefinitely.
No logs are produced in /var/log/clickhouse-server.
Issue occurs even with CLICKHOUSE_SKIP_USER_SETUP=1 and local paths configured.
NFS is healthy and writable; file creation as ClickHouse user succeeds.
After several restarts (sometimes hours), pod starts successfully.

Pod state example:

Containers:
  clickhouse:
    State: Running
    Ready: False
    Restart Count: 1
    Command:
      clickhouse extract-from-config --config-file /etc/clickhouse-server/config.xml --key=path
    Limits:
      cpu: 8
      memory: 16Gi

Relevant logs / outputs:

# ps aux
root          1  99.3  0.0 556000  2448 ?        Rsl  01:14   1:10 /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
root         22  0.0  0.0 10500  2836 ?        S    00:49   0:00 /bin/bash /entrypoint.sh
root         23 96.0  0.0 556016 18332 ?        Rl   00:49   0:21 clickhouse extract-from-config --config-file /etc/clickhouse-server/config.xml --key=path
# df -hT /var/lib/clickhouse
Filesystem                           Type  Size  Used Avail Use% Mounted on
10.74.41.50:/k8s_xdi_dev_clickhouse_ins00 nfs 101G  920M 100G  1% /var/lib/clickhouse
# ls -l /var/lib/clickhouse/preprocessed_configs
-rw-r-----  1 clickhouse clickhouse 92021 Nov  5 23:17 config.xml
-rw-r-----  1 clickhouse clickhouse 10072 Nov  5 23:17 users.xml

Additional Notes:

The hang seems independent of write activity; extract-from-config is read-light.
Multiple restarts eventually succeed.

Request:
Could the team investigate potential NFS-related hangs in extract-from-config, especially for first-time setups or pre-existing volumes? Any recommended workarounds or startup options would be appreciated.

Below is the chi

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":{"annotations":{"prometheus.io/port":"7000","prometheus.io/scrape":"true"},"labels":{"app.kubernetes.io/instance":"xdi-dev_clickhouse-ins-test"},"name":"clickhouse-xdi-dev-ins","namespace":"xdi-dev"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":1,"shardsCount":1},"name":"ins","templates":{"dataVolumeClaimTemplate":"chi-data-xdi-dev-ins","podTemplate":"chi-pod-xdi-dev-ins"}}],"files":{"kafka_ssl.xml":"\u003cclickhouse\u003e\n 
      \u003ckafka\u003e\n   
      \u003ckafka_auto_offset_reset\u003eearliest\u003c/kafka_auto_offset_reset\u003e\n   
      \u003csecurity_protocol\u003eSSL\u003c/security_protocol\u003e\n   
      \u003cssl_ca_location\u003e/etc/clickhouse-client/certs/ca.pem\u003c/ssl_ca_location\u003e\n   
      \u003cssl_certificate_location\u003e/etc/clickhouse-client/certs/client_cert.pem\u003c/ssl_certificate_location\u003e\n   
      \u003cssl_key_location\u003e/etc/clickhouse-client/certs/client_key.key\u003c/ssl_key_location\u003e\n 
      \u003c/kafka\u003e\n\u003c/clickhouse\u003e\n"},"users":{"admin/access_management":1,"admin/k8s_secret_password":"xdi-dev/clickhouse-ins-creds/admin","admin/networks/ip":"0.0.0.0/0","user/k8s_secret_password":"xdi-dev/clickhouse-ins-creds/user","user/networks/ip":"0.0.0.0/0","user/readonly":1}},"defaults":{"templates":{"dataVolumeClaimTemplate":"chi-data-xdi-dev-ins","podTemplate":"chi-pod-xdi-dev-ins"}},"stop":"no","taskID":"1","templates":{"podTemplates":[{"name":"chi-pod-xdi-dev-ins","spec":{"containers":[{"env":[{"name":"AWS_ACCESS_KEY_ID","valueFrom":{"secretKeyRef":{"key":"AWS_ACCESS_KEY_ID","name":"s3-clickhouse-credentials"}}},{"name":"AWS_SECRET_ACCESS_KEY","valueFrom":{"secretKeyRef":{"key":"AWS_SECRET_ACCESS_KEY","name":"s3-clickhouse-credentials"}}},{"name":"CLICKHOUSE_SKIP_USER_SETUP","value":"1"}],"image":"clickhouse/clickhouse-server:25.8","imagePullPolicy":"IfNotPresent","name":"clickhouse","resources":{"limits":{"cpu":8,"memory":"16Gi"},"requests":{"cpu":6,"memory":"16Gi"}},"volumeMounts":[{"mountPath":"/var/lib/clickhouse","name":"chi-data-xdi-dev-ins"},{"mountPath":"/etc/clickhouse-client/certs","name":"kafka-certs","readOnly":true}]}],"volumes":[{"name":"kafka-certs","secret":{"secretName":"kafka-certs"}}]}}],"volumeClaimTemplates":[{"name":"chi-data-xdi-dev-ins","reclaimPolicy":"Retain","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"xdi-dev-chi-ins"}}]},"troubleshoot":"no"}}
    prometheus.io/port: '7000'
    prometheus.io/scrape: 'true'
  creationTimestamp: '2025-11-06T01:14:40Z'
  finalizers:
    - finalizer.clickhouseinstallation.altinity.com
  generation: 2
  labels:
    app.kubernetes.io/instance: xdi-dev_clickhouse-ins-test
  name: clickhouse-xdi-dev-ins
  namespace: xdi-dev
  resourceVersion: '4917548229'
  uid: b45a7cc7-3d27-4000-bda0-faae3b28d4d1
spec:
  configuration:
    clusters:
      - layout:
          replicasCount: 1
          shardsCount: 1
        name: ins
        templates:
          dataVolumeClaimTemplate: chi-data-xdi-dev-ins
          podTemplate: chi-pod-xdi-dev-ins
    files:
      kafka_ssl.xml: |
        <clickhouse>
          <kafka>
            <kafka_auto_offset_reset>earliest</kafka_auto_offset_reset>
            <security_protocol>SSL</security_protocol>
            <ssl_ca_location>/etc/clickhouse-client/certs/ca.pem</ssl_ca_location>
            <ssl_certificate_location>/etc/clickhouse-client/certs/client_cert.pem</ssl_certificate_location>
            <ssl_key_location>/etc/clickhouse-client/certs/client_key.key</ssl_key_location>
          </kafka>
        </clickhouse>
    users:
      admin/access_management: 1
      admin/k8s_secret_password: xdi-dev/clickhouse-ins-creds/admin
      admin/networks/ip: 0.0.0.0/0
      user/k8s_secret_password: xdi-dev/clickhouse-ins-creds/user
      user/networks/ip: 0.0.0.0/0
      user/readonly: 1
  defaults:
    templates:
      dataVolumeClaimTemplate: chi-data-xdi-dev-ins
      podTemplate: chi-pod-xdi-dev-ins
  stop: 'no'
  taskID: '1'
  templates:
    podTemplates:
      - name: chi-pod-xdi-dev-ins
        spec:
          containers:
            - env:
                - name: AWS_ACCESS_KEY_ID
                  valueFrom:
                    secretKeyRef:
                      key: AWS_ACCESS_KEY_ID
                      name: s3-clickhouse-credentials
                - name: AWS_SECRET_ACCESS_KEY
                  valueFrom:
                    secretKeyRef:
                      key: AWS_SECRET_ACCESS_KEY
                      name: s3-clickhouse-credentials
                - name: CLICKHOUSE_SKIP_USER_SETUP
                  value: '1'
              image: 'clickhouse/clickhouse-server:25.8'
              imagePullPolicy: IfNotPresent
              name: clickhouse
              resources:
                limits:
                  cpu: 8
                  memory: 16Gi
                requests:
                  cpu: 6
                  memory: 16Gi
              volumeMounts:
                - mountPath: /var/lib/clickhouse
                  name: chi-data-xdi-dev-ins
                - mountPath: /etc/clickhouse-client/certs
                  name: kafka-certs
                  readOnly: true
          volumes:
            - name: kafka-certs
              secret:
                secretName: kafka-certs
    volumeClaimTemplates:
      - name: chi-data-xdi-dev-ins
        reclaimPolicy: Retain
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 100Gi
          storageClassName: xdi-dev-chi-ins
  troubleshoot: 'no'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions