Skip to content

Error syncing tables to S3 after upgrading to 2.5.0 from 2.4.2 - "invalid key: " #5909

@nc-tamu

Description

@nc-tamu

After upgrading from 2.4.2 to 2.5.0, the reader instance of Loki can no longer sync index tables with S3 storage.
Problem occours when trying to search for anything via Grafana interface:
image
On the log for the read instance the following can be seen:

level=info ts=2022-04-13T15:04:46.623064275Z caller=table_manager.go:176 msg="downloading all files for table loki_index_19095"
level=error ts=2022-04-13T15:04:46.672656292Z caller=cached_client.go:64 msg="failed to build cache" err="invalid key: "
level=error ts=2022-04-13T15:04:46.672702196Z caller=index_set.go:99 msg="failed to initialize table loki_index_19095, cleaning it up" err="invalid key: "
level=error ts=2022-04-13T15:04:46.67271967Z caller=table.go:304 table-name=loki_index_19095 msg="failed to init user index set fake" err="invalid key: "
level=info ts=2022-04-13T15:04:46.672730426Z caller=table.go:298 table-name=loki_index_19095 user-id=fake msg="downloaded index set at query time" duration=0.049333251
level=error ts=2022-04-13T15:04:46.672762948Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=error ts=2022-04-13T15:04:46.672792374Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=info ts=2022-04-13T15:04:46.67286043Z caller=metrics.go:122 component=querier org_id=fake latency=fast query="{env=\"tfe\"}" query_type=limited range_type=range length=5m0s step=200ms duration=52.296479ms status=500 limit=1000 returned_lines=0 throughput=0B total_bytes=0B queue_time=59.751µs subqueries=1
level=error ts=2022-04-13T15:04:46.673922719Z caller=retry.go:73 org_id=fake msg="error processing request" try=0 err="rpc error: code = Code(500) desc = invalid key: \n"
level=warn ts=2022-04-13T15:04:46.673932671Z caller=logging.go:72 traceID=56527f04f4314591 orgID=fake msg="GET /loki/api/v1/query_range?direction=BACKWARD&limit=1000&query=%7Benv%3D%22tfe%22%7D&start=1649861986466000000&end=1649862286466000000&step=0.2 (500) 54.564886ms Response: \"invalid key: \\n\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br; Accept-Language: en-US,en;q=0.9; Sec-Ch-Ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"Windows\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/8.3.3; X-Forwarded-For: 10.128.191.128, 10.128.191.128; X-Forwarded-Ssl: on; X-Grafana-Org-Id: 1; X-Original-Forwarded-For:; X-Real-Ip: ; X-Request-Id: 4ac2c36eb721afb205d0b5b96d217ecb; X-Scheme: https; "
level=error ts=2022-04-13T15:04:46.676009813Z caller=table.go:167 table-name=loki_index_19095 org_id=fake msg="index set fake has some problem, cleaning it up" err="invalid key: "
level=error ts=2022-04-13T15:04:46.676173924Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=error ts=2022-04-13T15:04:46.676614796Z caller=retry.go:73 org_id=fake msg="error processing request" try=1 err="rpc error: code = Code(500) desc = invalid key: \n"
level=error ts=2022-04-13T15:04:46.699635889Z caller=cached_client.go:64 msg="failed to build cache" err="invalid key: "
level=error ts=2022-04-13T15:04:46.699686752Z caller=index_set.go:99 msg="failed to initialize table loki_index_19095, cleaning it up" err="invalid key: "
level=error ts=2022-04-13T15:04:46.699704536Z caller=table.go:304 table-name=loki_index_19095 msg="failed to init user index set fake" err="invalid key: "
level=info ts=2022-04-13T15:04:46.699717285Z caller=table.go:298 table-name=loki_index_19095 user-id=fake msg="downloaded index set at query time" duration=0.020270309
level=error ts=2022-04-13T15:04:46.699746892Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=error ts=2022-04-13T15:04:46.700191724Z caller=retry.go:73 org_id=fake msg="error processing request" try=2 err="rpc error: code = Code(500) desc = invalid key: \n"
level=error ts=2022-04-13T15:04:46.70248434Z caller=table.go:167 table-name=loki_index_19095 org_id=fake msg="index set fake has some problem, cleaning it up" err="invalid key: "
level=error ts=2022-04-13T15:04:46.702644394Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=error ts=2022-04-13T15:04:46.734802176Z caller=retry.go:73 org_id=fake msg="error processing request" try=3 err="rpc error: code = Code(500) desc = invalid key: \n"
level=error ts=2022-04-13T15:04:46.759067415Z caller=cached_client.go:64 msg="failed to build cache" err="invalid key: "
level=error ts=2022-04-13T15:04:46.759110922Z caller=index_set.go:99 msg="failed to initialize table loki_index_19095, cleaning it up" err="invalid key: "
level=error ts=2022-04-13T15:04:46.759132366Z caller=table.go:304 table-name=loki_index_19095 msg="failed to init user index set fake" err="invalid key: "
level=info ts=2022-04-13T15:04:46.75914639Z caller=table.go:298 table-name=loki_index_19095 user-id=fake msg="downloaded index set at query time" duration=0.019889037
level=error ts=2022-04-13T15:04:46.759180956Z caller=chunk_store.go:526 org_id=fake msg="error querying storage" err="invalid key: "
level=error ts=2022-04-13T15:04:46.759700945Z caller=retry.go:73 org_id=fake msg="error processing request" try=4 err="rpc error: code = Code(500) desc = invalid key: \n"
level=warn ts=2022-04-13T15:04:46.759806671Z caller=logging.go:72 traceID=102585fd79655ea0 orgID=fake msg="GET /loki/api/v1/label?start=1649861986518000000&end=1649862286518000000 (500) 102.569608ms Response: \"invalid key: \\n\" ws: false; Accept: application/json, text/plain, */*; Accept-Encoding: gzip, deflate, br; Accept-Language: en-US,en;q=0.9; Sec-Ch-Ua: \" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"; Sec-Ch-Ua-Mobile: ?0; Sec-Ch-Ua-Platform: \"Windows\"; Sec-Fetch-Dest: empty; Sec-Fetch-Mode: cors; Sec-Fetch-Site: same-origin; User-Agent: Grafana/8.3.3; X-Forwarded-For: , 10.128.213.0, 10.128.213.0; X-Forwarded-Ssl: on; X-Grafana-Org-Id: 1; X-Original-Forwarded-For: ; X-Real-Ip: ; X-Request-Id: d47fe27867462d323ca5830a75dd3eda; X-Scheme: https; "
level=info ts=2022-04-13T15:04:51.434721583Z caller=table_manager.go:209 msg="syncing tables"
level=error ts=2022-04-13T15:04:51.471117263Z caller=cached_client.go:64 msg="failed to build cache" err="invalid key: "
level=error ts=2022-04-13T15:04:51.471177807Z caller=table_manager.go:107 msg="error syncing local boltdb files with storage" err="failed to sync index set fake for table loki_index_19095: invalid key: "

The following stacktrace can be observed after restarting the container:

level=error ts=2022-04-13T15:19:31.18143427Z caller=log.go:100 msg="error running loki" err="invalid key: 
error creating index client
github.com/grafana/loki/pkg/storage/chunk/storage.NewStore
	/src/loki/pkg/storage/chunk/storage/factory.go:219
github.com/grafana/loki/pkg/loki.(*Loki).initStore
	/src/loki/pkg/loki/modules.go:394
github.com/grafana/dskit/modules.(*Manager).initModule
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:106
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:78
github.com/grafana/loki/pkg/loki.(*Loki).Run
	/src/loki/pkg/loki/loki.go:339
main.main
	/src/loki/cmd/loki/main.go:108
runtime.main
	/usr/local/go/src/runtime/proc.go:255
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1581
error initialising module: store
github.com/grafana/dskit/modules.(*Manager).initModule
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:108
github.com/grafana/dskit/modules.(*Manager).InitModuleServices
	/src/loki/vendor/github.com/grafana/dskit/modules/modules.go:78
github.com/grafana/loki/pkg/loki.(*Loki).Run
	/src/loki/pkg/loki/loki.go:339
main.main
	/src/loki/cmd/loki/main.go:108
runtime.main
	/usr/local/go/src/runtime/proc.go:255
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1581

Downgrading back to 2.4.2 resolves this issue and objects can be seen downloaded from the S3 storage.

Config used for Loki:

# Config file for loki

auth_enabled: false

server:
  log_level: info
  http_listen_port: 3100
  http_listen_address: 0.0.0.0
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857608
  grpc_server_max_concurrent_streams: 1000

distributor:
  ring:
    kvstore:
      store: memberlist

memberlist:
  join_members:
    - loki-memberlist

common:
  #path_prefix: /data/loki
  replication_factor: 1
  storage:
    filesystem:
      chunks_directory: /data/loki/chunks
      rules_directory: /data/loki/rules

ingester:
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  wal:
    enabled: true
    dir: /data/loki/wal
    flush_on_shutdown: true

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: s3
      schema: v11
      index:
        prefix: loki_index_
        period: 24h

storage_config:
  aws:
    s3: https://${S3_ACCESS_KEY}:${S3_SECRET_KEY}@${S3_HOST}:443/${S3_BUCKET}
    s3forcepathstyle: true
    insecure: false
    http_config:
      insecure_skip_verify: true
  boltdb_shipper:
    active_index_directory: /data/loki/index
    cache_location: /data/loki/index_caching
    cache_ttl: 480h
    resync_interval: 5s
    shared_store: s3

compactor:
  working_directory: /data/loki/boltdb-shipper-compactor
  shared_store: s3
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150
  compactor_ring:
    kvstore:
      store: memberlist

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h
  chunk_tables_provisioning:
    inactive_read_throughput: 10
    inactive_write_throughput: 10
    provisioned_read_throughput: 50
    provisioned_write_throughput: 20
  index_tables_provisioning:
    inactive_read_throughput: 10
    inactive_write_throughput: 10
    provisioned_read_throughput: 50
    provisioned_write_throughput: 20

Container args:

- -config.file=/etc/loki/config/config.yaml
- -target=read
- -config.expand-env=true

Any idea what is going on here?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions