Skip to content

Commit bb28fb5

Browse files
authored
Add bucket index support to store gateway (cortexproject#3625)
* Integrated bucket index in store-gateway Signed-off-by: Marco Pracucci <[email protected]> * Unit tested BucketIndexMetadataFetcher Signed-off-by: Marco Pracucci <[email protected]> * Fixed bucketindex unit test Signed-off-by: Marco Pracucci <[email protected]> * Improved store-gateway unit tests Signed-off-by: Marco Pracucci <[email protected]> * Updated doc Signed-off-by: Marco Pracucci <[email protected]> * Upated CHANGELOG Signed-off-by: Marco Pracucci <[email protected]> * Fixed doc and comments Signed-off-by: Marco Pracucci <[email protected]> * Log even the case the bucket index does not exist Signed-off-by: Marco Pracucci <[email protected]> * Do not track failure if bucket index does not exist when reading it Signed-off-by: Marco Pracucci <[email protected]> * Added cortex_bucket_blocks_partials_count metric exported by compactor Signed-off-by: Marco Pracucci <[email protected]> * Improved error handling Signed-off-by: Marco Pracucci <[email protected]> * Added missing doc image Signed-off-by: Marco Pracucci <[email protected]> * Updated code comment Signed-off-by: Marco Pracucci <[email protected]>
1 parent 739d3f0 commit bb28fb5

35 files changed

+1145
-281
lines changed

CHANGELOG.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,16 @@
66
* [CHANGE] Blocks storage: compactor is now required when running a Cortex cluster with the blocks storage, because it also keeps the bucket index updated. #3583
77
* [CHANGE] Blocks storage: block deletion marks are now stored in a per-tenant global markers/ location too, other than within the block location. The compactor, at startup, will copy deletion marks from the block location to the global location. This migration is required only once, so you can safely disable it via `-compactor.block-deletion-marks-migration-enabled=false` once new compactor has successfully started once in your cluster. #3583
88
* [FEATURE] Querier: Queries can be federated across multiple tenants. The tenants IDs involved need to be specified separated by a `|` character in the `X-Scope-OrgID` request header. This is an experimental feature, which can be enabled by setting `-tenant-federation.enabled=true` on all Cortex services. #3250
9-
* [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers and store-gateways. The bucket index is updated by the compactor during blocks cleanup, on every `-compactor.cleanup-interval`. #3553 #3555 #3561 #3583
10-
* [ENHANCEMENT] Blocks storage: introduced an option `-blocks-storage.bucket-store.bucket-index.enabled` to enable the usage of the bucket index in the querier. When enabled, the querier will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics have been added: #3614
9+
* [ENHANCEMENT] Blocks storage: introduced a per-tenant bucket index, periodically updated by the compactor, used to avoid full bucket scanning done by queriers and store-gateways. The bucket index is updated by the compactor during blocks cleanup, on every `-compactor.cleanup-interval`. #3553 #3555 #3561 #3583 #3625
10+
* [ENHANCEMENT] Blocks storage: introduced an option `-blocks-storage.bucket-store.bucket-index.enabled` to enable the usage of the bucket index in the querier and store-gateway. When enabled, the querier and store-gateway will use the bucket index to find a tenant's blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier: #3614 #3625
1111
* `cortex_bucket_index_loads_total`
1212
* `cortex_bucket_index_load_failures_total`
1313
* `cortex_bucket_index_load_duration_seconds`
1414
* `cortex_bucket_index_loaded`
15-
* [ENHANCEMENT] Compactor: exported the following metrics. #3583
16-
* `cortex_bucket_blocks_count`: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion.
15+
* [ENHANCEMENT] Compactor: exported the following metrics. #3583 #3625
16+
* `cortex_bucket_blocks_count`: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.
1717
* `cortex_bucket_blocks_marked_for_deletion_count`: Total number of blocks per tenant marked for deletion in the bucket.
18+
* `cortex_bucket_blocks_partials_count`: Total number of partial blocks.
1819
* `cortex_bucket_index_last_successful_update_timestamp_seconds`: Timestamp of the last successful update of a tenant's bucket index.
1920
* [ENHANCEMENT] Ruler: Add `cortex_prometheus_last_evaluation_samples` to expose the number of samples generated by a rule group per tenant. #3582
2021
* [ENHANCEMENT] Memberlist: add status page (/memberlist) with available details about memberlist-based KV store and memberlist cluster. It's also possible to view KV values in Go struct or JSON format, or download for inspection. #3575

docs/blocks-storage/bucket-index.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@ weight: 5
55
slug: bucket-index
66
---
77

8-
The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor and used by queriers to discover blocks in the storage.
8+
The bucket index is a **per-tenant file containing the list of blocks and block deletion marks** in the storage. The bucket index itself is stored in the backend object storage, is periodically updated by the compactor, and used by queriers and store-gateways to discover blocks in the storage.
99

1010
The bucket index usage is **optional** and can be enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true` (or its respective YAML config option).
1111

1212
## Benefits
1313

14-
The [querier](./querier.md) needs to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time. Because of this, querier needs to periodically scan the bucket to look for new blocks uploaded by ingester or compactor, and blocks deleted (or marked for deletion) by compactor.
14+
The [querier](./querier.md) and [store-gateway](./store-gateway.md) need to have an almost up-to-date view over the entire storage bucket, in order to find the right blocks to lookup at query time (querier) and load block's [index-header](./binary-index-header.md) (store-gateway). Because of this, they need to periodically scan the bucket to look for new blocks uploaded by ingester or compactor, and blocks deleted (or marked for deletion) by compactor.
1515

16-
When this bucket index is enabled, the querier periodically look up the per-tenant bucket index instead of scanning the bucket via "list objects" operations. This brings few benefits:
16+
When the bucket index is enabled, the querier and store-gateway periodically look up the per-tenant bucket index instead of scanning the bucket via "list objects" operations. This brings few benefits:
1717

18-
1. Reduced number of API calls to the object storage by querier
19-
2. No "list objects" storage API calls done by querier
18+
1. Reduced number of API calls to the object storage by querier and store-gateway
19+
2. No "list objects" storage API calls done by querier and store-gateway
2020
3. The [querier](./querier.md) is up and running immediately after the startup (no need to run an initial bucket scan)
2121

2222
## Structure of the index
@@ -42,7 +42,7 @@ The [querier](./querier.md), at query time, checks whether the bucket index for
4242

4343
_Given it's a small file, lazy downloading it doesn't significantly impact on first query performances, but allows to get a querier up and running without pre-downloading every tenant's bucket index. Moreover, if the [metadata cache](./querier.md#metadata-cache) is enabled, the bucket index will be cached for a short time in a shared cache, reducing the actual latency and number of API calls to the object storage in case multiple queriers will fetch the same tenant's bucket index in a short time._
4444

45-
![Querier - Bucket index](/images/blocks-storage/bucket-index-querier-logic.png)
45+
![Querier - Bucket index](/images/blocks-storage/bucket-index-querier-workflow.png)
4646
<!-- Diagram source at https://docs.google.com/presentation/d/1bHp8_zcoWCYoNU2AhO2lSagQyuIrghkCncViSqn14cU/edit -->
4747

4848
While in-memory, a background process will keep it **updated at periodic intervals**, so that subsequent queries from the same tenant to the same querier instance will use the cached (and periodically updated) bucket index. There are two config options involved:
@@ -55,3 +55,7 @@ While in-memory, a background process will keep it **updated at periodic interva
5555
If a bucket index is unused for a long time (configurable via `-blocks-storage.bucket-store.bucket-index.idle-timeout`), e.g. because that querier instance is not receiving any query from the tenant, the querier will offload it, stopping to keep it updated at regular intervals. This is particularly for tenants which are resharded to different queriers when [shuffle sharding](../guides/shuffle-sharding.md) is enabled.
5656

5757
Finally, the querier, at query time, checks how old is a bucket index (based on its `updated_at`) and fail a query if its age is older than `-blocks-storage.bucket-store.bucket-index.max-stale-period`. This circuit breaker is used to ensure queriers will not return any partial query results due to a stale view over the long-term storage.
58+
59+
## How it's used by the store-gateway
60+
61+
The [store-gateway](./store-gateway.md), at startup and periodically, fetches the bucket index for each tenant belonging to their shard and uses it as the source of truth for the blocks (and deletion marks) in the storage. This removes the need to periodically scan the bucket to discover blocks belonging to their shard.

docs/blocks-storage/compactor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ slug: compactor
1010
The **compactor** is an service which is responsible to:
1111

1212
- Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
13-
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
13+
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
1414

1515
The compactor is **stateless**.
1616

docs/blocks-storage/compactor.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ slug: compactor
1010
The **compactor** is an service which is responsible to:
1111

1212
- Compact multiple blocks of a given tenant into a single optimized larger block. This helps to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).
13-
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) to discover new blocks in the storage.
13+
- Keep the per-tenant bucket index updated. The [bucket index](./bucket-index.md) is used by [queriers](./querier.md) and [store-gateways](./store-gateway.md) to discover new blocks in the storage.
1414

1515
The compactor is **stateless**.
1616

docs/blocks-storage/querier.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -365,8 +365,9 @@ blocks_storage:
365365
# CLI flag: -blocks-storage.bucket-store.sync-dir
366366
[sync_dir: <string> | default = "tsdb-sync"]
367367
368-
# How frequently scan the bucket to look for changes (new blocks shipped by
369-
# ingesters and blocks removed by retention or compaction). 0 disables it.
368+
# How frequently scan the bucket - or fetch the bucket index (if enabled) -
369+
# to look for changes (new blocks shipped by ingesters and blocks removed by
370+
# retention or compaction). 0 disables it.
370371
# CLI flag: -blocks-storage.bucket-store.sync-interval
371372
[sync_interval: <duration> | default = 5m]
372373
@@ -634,22 +635,24 @@ blocks_storage:
634635
[ignore_deletion_mark_delay: <duration> | default = 6h]
635636
636637
bucket_index:
637-
# True to enable querier to discover blocks in the storage via bucket
638-
# index instead of bucket scanning.
638+
# True to enable querier and store-gateway to discover blocks in the
639+
# storage via bucket index instead of bucket scanning.
639640
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
640641
[enabled: <boolean> | default = false]
641642
642-
# How frequently a cached bucket index should be refreshed.
643+
# How frequently a cached bucket index should be refreshed. This option is
644+
# used only by querier.
643645
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
644646
[update_on_stale_interval: <duration> | default = 15m]
645647
646648
# How frequently a bucket index, which previously failed to load, should
647-
# be tried to load again.
649+
# be tried to load again. This option is used only by querier.
648650
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
649651
[update_on_error_interval: <duration> | default = 1m]
650652
651653
# How long a unused bucket index should be cached. Once this timeout
652654
# expires, the unused bucket index is removed from the in-memory cache.
655+
# This option is used only by querier.
653656
# CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
654657
[idle_timeout: <duration> | default = 1h]
655658

docs/blocks-storage/store-gateway.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ The store-gateway is **semi-stateful**.
1313

1414
## How it works
1515

16+
The store-gateway needs to have an almost up-to-date view over the storage bucket, in order to discover blocks belonging to their shard. The store-gateway can keep the bucket view updated in to two different ways:
17+
18+
1. Periodically scanning the bucket (default)
19+
2. Periodically downloading the [bucket index](./bucket-index.md)
20+
21+
### Bucket index disabled (default)
22+
1623
At startup **store-gateways** iterate over the entire storage bucket to discover blocks for all tenants and download the `meta.json` and index-header for each block. During this initial bucket synchronization phase, the store-gateway `/ready` readiness probe endpoint will fail.
1724

1825
While running, store-gateways periodically rescan the storage bucket to discover new blocks (uploaded by the ingesters and [compactor](./compactor.md)) and blocks marked for deletion or fully deleted since the last scan (as a result of compaction). The frequency at which this occurs is configured via `-blocks-storage.bucket-store.sync-interval`.
@@ -21,6 +28,12 @@ The blocks chunks and the entire index are never fully downloaded by the store-g
2128

2229
_For more information about the index-header, please refer to [Binary index-header documentation](./binary-index-header.md)._
2330

31+
### Bucket index enabled
32+
33+
When bucket index is enabled, the overall workflow is the same but, instead of iterating over the bucket objects, the store-gateway fetch the [bucket index](./bucket-index.md) for each tenant belonging to their shard in order to discover each tenant's blocks and block deletion marks.
34+
35+
_For more information about the bucket index, please refer to [bucket index documentation](./bucket-index.md)._
36+
2437
## Blocks sharding and replication
2538

2639
The store-gateway optionally supports blocks sharding. Sharding can be used to horizontally scale blocks in a large cluster without hitting any vertical scalability limit.
@@ -399,8 +412,9 @@ blocks_storage:
399412
# CLI flag: -blocks-storage.bucket-store.sync-dir
400413
[sync_dir: <string> | default = "tsdb-sync"]
401414
402-
# How frequently scan the bucket to look for changes (new blocks shipped by
403-
# ingesters and blocks removed by retention or compaction). 0 disables it.
415+
# How frequently scan the bucket - or fetch the bucket index (if enabled) -
416+
# to look for changes (new blocks shipped by ingesters and blocks removed by
417+
# retention or compaction). 0 disables it.
404418
# CLI flag: -blocks-storage.bucket-store.sync-interval
405419
[sync_interval: <duration> | default = 5m]
406420
@@ -668,22 +682,24 @@ blocks_storage:
668682
[ignore_deletion_mark_delay: <duration> | default = 6h]
669683
670684
bucket_index:
671-
# True to enable querier to discover blocks in the storage via bucket
672-
# index instead of bucket scanning.
685+
# True to enable querier and store-gateway to discover blocks in the
686+
# storage via bucket index instead of bucket scanning.
673687
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
674688
[enabled: <boolean> | default = false]
675689
676-
# How frequently a cached bucket index should be refreshed.
690+
# How frequently a cached bucket index should be refreshed. This option is
691+
# used only by querier.
677692
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
678693
[update_on_stale_interval: <duration> | default = 15m]
679694
680695
# How frequently a bucket index, which previously failed to load, should
681-
# be tried to load again.
696+
# be tried to load again. This option is used only by querier.
682697
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
683698
[update_on_error_interval: <duration> | default = 1m]
684699
685700
# How long a unused bucket index should be cached. Once this timeout
686701
# expires, the unused bucket index is removed from the in-memory cache.
702+
# This option is used only by querier.
687703
# CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
688704
[idle_timeout: <duration> | default = 1h]
689705

docs/blocks-storage/store-gateway.template

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@ The store-gateway is **semi-stateful**.
1313

1414
## How it works
1515

16+
The store-gateway needs to have an almost up-to-date view over the storage bucket, in order to discover blocks belonging to their shard. The store-gateway can keep the bucket view updated in to two different ways:
17+
18+
1. Periodically scanning the bucket (default)
19+
2. Periodically downloading the [bucket index](./bucket-index.md)
20+
21+
### Bucket index disabled (default)
22+
1623
At startup **store-gateways** iterate over the entire storage bucket to discover blocks for all tenants and download the `meta.json` and index-header for each block. During this initial bucket synchronization phase, the store-gateway `/ready` readiness probe endpoint will fail.
1724

1825
While running, store-gateways periodically rescan the storage bucket to discover new blocks (uploaded by the ingesters and [compactor](./compactor.md)) and blocks marked for deletion or fully deleted since the last scan (as a result of compaction). The frequency at which this occurs is configured via `-blocks-storage.bucket-store.sync-interval`.
@@ -21,6 +28,12 @@ The blocks chunks and the entire index are never fully downloaded by the store-g
2128

2229
_For more information about the index-header, please refer to [Binary index-header documentation](./binary-index-header.md)._
2330

31+
### Bucket index enabled
32+
33+
When bucket index is enabled, the overall workflow is the same but, instead of iterating over the bucket objects, the store-gateway fetch the [bucket index](./bucket-index.md) for each tenant belonging to their shard in order to discover each tenant's blocks and block deletion marks.
34+
35+
_For more information about the bucket index, please refer to [bucket index documentation](./bucket-index.md)._
36+
2437
## Blocks sharding and replication
2538

2639
The store-gateway optionally supports blocks sharding. Sharding can be used to horizontally scale blocks in a large cluster without hitting any vertical scalability limit.

docs/configuration/config-file-reference.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3682,8 +3682,9 @@ bucket_store:
36823682
# CLI flag: -blocks-storage.bucket-store.sync-dir
36833683
[sync_dir: <string> | default = "tsdb-sync"]
36843684
3685-
# How frequently scan the bucket to look for changes (new blocks shipped by
3686-
# ingesters and blocks removed by retention or compaction). 0 disables it.
3685+
# How frequently scan the bucket - or fetch the bucket index (if enabled) - to
3686+
# look for changes (new blocks shipped by ingesters and blocks removed by
3687+
# retention or compaction). 0 disables it.
36873688
# CLI flag: -blocks-storage.bucket-store.sync-interval
36883689
[sync_interval: <duration> | default = 5m]
36893690
@@ -3950,22 +3951,24 @@ bucket_store:
39503951
[ignore_deletion_mark_delay: <duration> | default = 6h]
39513952
39523953
bucket_index:
3953-
# True to enable querier to discover blocks in the storage via bucket index
3954-
# instead of bucket scanning.
3954+
# True to enable querier and store-gateway to discover blocks in the storage
3955+
# via bucket index instead of bucket scanning.
39553956
# CLI flag: -blocks-storage.bucket-store.bucket-index.enabled
39563957
[enabled: <boolean> | default = false]
39573958
3958-
# How frequently a cached bucket index should be refreshed.
3959+
# How frequently a cached bucket index should be refreshed. This option is
3960+
# used only by querier.
39593961
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-stale-interval
39603962
[update_on_stale_interval: <duration> | default = 15m]
39613963
39623964
# How frequently a bucket index, which previously failed to load, should be
3963-
# tried to load again.
3965+
# tried to load again. This option is used only by querier.
39643966
# CLI flag: -blocks-storage.bucket-store.bucket-index.update-on-error-interval
39653967
[update_on_error_interval: <duration> | default = 1m]
39663968
39673969
# How long a unused bucket index should be cached. Once this timeout
3968-
# expires, the unused bucket index is removed from the in-memory cache.
3970+
# expires, the unused bucket index is removed from the in-memory cache. This
3971+
# option is used only by querier.
39693972
# CLI flag: -blocks-storage.bucket-store.bucket-index.idle-timeout
39703973
[idle_timeout: <duration> | default = 1h]
39713974

0 commit comments

Comments
 (0)