Skip to content

helm chart does not flush wal on scaling down singleBinary #17087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
taoyouh opened this issue Apr 9, 2025 · 0 comments
Open

helm chart does not flush wal on scaling down singleBinary #17087

taoyouh opened this issue Apr 9, 2025 · 0 comments
Labels
area/helm type/bug Somehing is not working as expected

Comments

@taoyouh
Copy link

taoyouh commented Apr 9, 2025

Describe the bug
The helm chart doesn't have preStop lifecycle hooks to flush wal on shutdown for singleBinary, thus singleBinary does not flush data to object store when reducing number of replicas. Unlike the microservice pods, the singleBinary pods doesn't have lifecycle defined in helm templates or values.yaml.

Given the fact that the enableStatefulSetAutoDeletePVC is set to true, it's quite dangerous not to have prestop lifecycle hook for flushing or flush-on-shutdown enabled. If one manually scale down singleBinary using helm, the non-flushed wal will be gone with its deleted PVC. This is quite confusing and error-prone.

For the scalable or microservice pods, by default they only enable the lifecycle hooks when autoscaling is enabled. This config could also lead to data loss when one manually scales down the deployment.

The following is in write-statefulset.yaml but not in single-binary/stateful.yaml:

          lifecycle:
            {{- toYaml .Values.write.lifecycle | nindent 12 }}
          {{- else if .Values.write.autoscaling.enabled }}
          lifecycle:
            preStop:
              httpGet:
                path: "/ingester/shutdown?terminate=false"
                port: http-metrics

To Reproduce
Steps to reproduce the behavior:

  1. Install helm chart 6.29.0 with singleBinary replicas set to 3
  2. Have some logs pushed to loki
  3. Update helm chart to reduce replicas to 1
  4. Query for the logs and you'll find some of the logs are gone

Expected behavior
All logs shall be persistent.

Environment:

  • Infrastructure: kubernetes (k3s v1.31.6+k3s1 (6ab750f9))
  • Deployment tool: helm

Screenshots, Promtail config, or terminal output
My values.yaml:

deploymentMode: SingleBinary
singleBinary:
  replicas: 1
  extraArgs:
  - -config.expand-env
  extraEnvFrom:
  - secretRef:
      name: loki
loki:
  commonConfig:
    replication_factor: 1
  storage:
    bucketNames:
      ...
    use_thanos_objstore: true
    object_store:
      type: s3
      s3:
        ...
  compactor:
    retention_enabled: true
    delete_request_store: s3
  ingester:
    wal:
      flush_on_shutdown: true
  limits_config:
    retention_period: 200d
  schemaConfig:
    configs:
    - from: 2022-01-11
      store: boltdb-shipper
      object_store: s3
      schema: v12
      index:
        prefix: loki_index_
        period: 24h
    - from: 2024-06-17
      store: boltdb-shipper
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
    - from: 2024-06-18
      store: tsdb
      object_store: s3
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
  storage_config:
    use_thanos_objstore: true
lokiCanary:
  extraArgs:
  - -interval
  - 10s
  - -spot-check-query-rate
  - 10m
test:
  enabled: false
gateway:
  affinity:
    podAntiAffinity: null
read:
  replicas: 0
write:
  replicas: 0
backend:
  replicas: 0
chunksCache:
  enabled: false
resultsCache:
  enabled: false
sidecar:
  rules:
    enabled: false
@JStickler JStickler added area/helm type/bug Somehing is not working as expected labels Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

2 participants