-
Notifications
You must be signed in to change notification settings - Fork 517
Description
Description:
We are experiencing an intermittent issue where ClickHouse pods get stuck during startup at the extract-from-config phase when using NFS-mounted storage. The issue resolves itself after multiple restarts, but it blocks pod readiness meanwhile.
Environment:
ClickHouse version: 25.8.9.20 (official build)
Docker image: clickhouse/clickhouse-server:25.8
Kubernetes: v1.28.x
NFS storage: v3, options: rw,noatime,sync,vers=3,rsize=8192,wsize=8192,hard,proto=tcp,timeo=600,retrans=2,sec=sys
CPU / Memory: 6 cores / 16Gi
CI/CD: ArgoCD deployment
Symptoms:
Pod starts, but clickhouse extract-from-config hangs indefinitely.
No logs are produced in /var/log/clickhouse-server.
Issue occurs even with CLICKHOUSE_SKIP_USER_SETUP=1 and local paths configured.
NFS is healthy and writable; file creation as ClickHouse user succeeds.
After several restarts (sometimes hours), pod starts successfully.
Pod state example:
Containers:
clickhouse:
State: Running
Ready: False
Restart Count: 1
Command:
clickhouse extract-from-config --config-file /etc/clickhouse-server/config.xml --key=path
Limits:
cpu: 8
memory: 16GiRelevant logs / outputs:
# ps aux
root 1 99.3 0.0 556000 2448 ? Rsl 01:14 1:10 /usr/bin/clickhouse-server --config-file=/etc/clickhouse-server/config.xml
root 22 0.0 0.0 10500 2836 ? S 00:49 0:00 /bin/bash /entrypoint.sh
root 23 96.0 0.0 556016 18332 ? Rl 00:49 0:21 clickhouse extract-from-config --config-file /etc/clickhouse-server/config.xml --key=path
# df -hT /var/lib/clickhouse
Filesystem Type Size Used Avail Use% Mounted on
10.74.41.50:/k8s_xdi_dev_clickhouse_ins00 nfs 101G 920M 100G 1% /var/lib/clickhouse
# ls -l /var/lib/clickhouse/preprocessed_configs
-rw-r----- 1 clickhouse clickhouse 92021 Nov 5 23:17 config.xml
-rw-r----- 1 clickhouse clickhouse 10072 Nov 5 23:17 users.xml
Additional Notes:
The hang seems independent of write activity; extract-from-config is read-light.
Multiple restarts eventually succeed.
Request:
Could the team investigate potential NFS-related hangs in extract-from-config, especially for first-time setups or pre-existing volumes? Any recommended workarounds or startup options would be appreciated.
Below is the chi
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"clickhouse.altinity.com/v1","kind":"ClickHouseInstallation","metadata":{"annotations":{"prometheus.io/port":"7000","prometheus.io/scrape":"true"},"labels":{"app.kubernetes.io/instance":"xdi-dev_clickhouse-ins-test"},"name":"clickhouse-xdi-dev-ins","namespace":"xdi-dev"},"spec":{"configuration":{"clusters":[{"layout":{"replicasCount":1,"shardsCount":1},"name":"ins","templates":{"dataVolumeClaimTemplate":"chi-data-xdi-dev-ins","podTemplate":"chi-pod-xdi-dev-ins"}}],"files":{"kafka_ssl.xml":"\u003cclickhouse\u003e\n
\u003ckafka\u003e\n
\u003ckafka_auto_offset_reset\u003eearliest\u003c/kafka_auto_offset_reset\u003e\n
\u003csecurity_protocol\u003eSSL\u003c/security_protocol\u003e\n
\u003cssl_ca_location\u003e/etc/clickhouse-client/certs/ca.pem\u003c/ssl_ca_location\u003e\n
\u003cssl_certificate_location\u003e/etc/clickhouse-client/certs/client_cert.pem\u003c/ssl_certificate_location\u003e\n
\u003cssl_key_location\u003e/etc/clickhouse-client/certs/client_key.key\u003c/ssl_key_location\u003e\n
\u003c/kafka\u003e\n\u003c/clickhouse\u003e\n"},"users":{"admin/access_management":1,"admin/k8s_secret_password":"xdi-dev/clickhouse-ins-creds/admin","admin/networks/ip":"0.0.0.0/0","user/k8s_secret_password":"xdi-dev/clickhouse-ins-creds/user","user/networks/ip":"0.0.0.0/0","user/readonly":1}},"defaults":{"templates":{"dataVolumeClaimTemplate":"chi-data-xdi-dev-ins","podTemplate":"chi-pod-xdi-dev-ins"}},"stop":"no","taskID":"1","templates":{"podTemplates":[{"name":"chi-pod-xdi-dev-ins","spec":{"containers":[{"env":[{"name":"AWS_ACCESS_KEY_ID","valueFrom":{"secretKeyRef":{"key":"AWS_ACCESS_KEY_ID","name":"s3-clickhouse-credentials"}}},{"name":"AWS_SECRET_ACCESS_KEY","valueFrom":{"secretKeyRef":{"key":"AWS_SECRET_ACCESS_KEY","name":"s3-clickhouse-credentials"}}},{"name":"CLICKHOUSE_SKIP_USER_SETUP","value":"1"}],"image":"clickhouse/clickhouse-server:25.8","imagePullPolicy":"IfNotPresent","name":"clickhouse","resources":{"limits":{"cpu":8,"memory":"16Gi"},"requests":{"cpu":6,"memory":"16Gi"}},"volumeMounts":[{"mountPath":"/var/lib/clickhouse","name":"chi-data-xdi-dev-ins"},{"mountPath":"/etc/clickhouse-client/certs","name":"kafka-certs","readOnly":true}]}],"volumes":[{"name":"kafka-certs","secret":{"secretName":"kafka-certs"}}]}}],"volumeClaimTemplates":[{"name":"chi-data-xdi-dev-ins","reclaimPolicy":"Retain","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"xdi-dev-chi-ins"}}]},"troubleshoot":"no"}}
prometheus.io/port: '7000'
prometheus.io/scrape: 'true'
creationTimestamp: '2025-11-06T01:14:40Z'
finalizers:
- finalizer.clickhouseinstallation.altinity.com
generation: 2
labels:
app.kubernetes.io/instance: xdi-dev_clickhouse-ins-test
name: clickhouse-xdi-dev-ins
namespace: xdi-dev
resourceVersion: '4917548229'
uid: b45a7cc7-3d27-4000-bda0-faae3b28d4d1
spec:
configuration:
clusters:
- layout:
replicasCount: 1
shardsCount: 1
name: ins
templates:
dataVolumeClaimTemplate: chi-data-xdi-dev-ins
podTemplate: chi-pod-xdi-dev-ins
files:
kafka_ssl.xml: |
<clickhouse>
<kafka>
<kafka_auto_offset_reset>earliest</kafka_auto_offset_reset>
<security_protocol>SSL</security_protocol>
<ssl_ca_location>/etc/clickhouse-client/certs/ca.pem</ssl_ca_location>
<ssl_certificate_location>/etc/clickhouse-client/certs/client_cert.pem</ssl_certificate_location>
<ssl_key_location>/etc/clickhouse-client/certs/client_key.key</ssl_key_location>
</kafka>
</clickhouse>
users:
admin/access_management: 1
admin/k8s_secret_password: xdi-dev/clickhouse-ins-creds/admin
admin/networks/ip: 0.0.0.0/0
user/k8s_secret_password: xdi-dev/clickhouse-ins-creds/user
user/networks/ip: 0.0.0.0/0
user/readonly: 1
defaults:
templates:
dataVolumeClaimTemplate: chi-data-xdi-dev-ins
podTemplate: chi-pod-xdi-dev-ins
stop: 'no'
taskID: '1'
templates:
podTemplates:
- name: chi-pod-xdi-dev-ins
spec:
containers:
- env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: AWS_ACCESS_KEY_ID
name: s3-clickhouse-credentials
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: AWS_SECRET_ACCESS_KEY
name: s3-clickhouse-credentials
- name: CLICKHOUSE_SKIP_USER_SETUP
value: '1'
image: 'clickhouse/clickhouse-server:25.8'
imagePullPolicy: IfNotPresent
name: clickhouse
resources:
limits:
cpu: 8
memory: 16Gi
requests:
cpu: 6
memory: 16Gi
volumeMounts:
- mountPath: /var/lib/clickhouse
name: chi-data-xdi-dev-ins
- mountPath: /etc/clickhouse-client/certs
name: kafka-certs
readOnly: true
volumes:
- name: kafka-certs
secret:
secretName: kafka-certs
volumeClaimTemplates:
- name: chi-data-xdi-dev-ins
reclaimPolicy: Retain
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: xdi-dev-chi-ins
troubleshoot: 'no'