Skip to content

Add metrics for remaining planned compactions #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: shuffle-sharding-compactor
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,30 @@
# Changelog

## master / unreleased
* [FEATURE] Ruler: Add new `-ruler.query-stats-enabled` which when enabled will report the `cortex_ruler_query_seconds_total` as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds. #4317

* [CHANGE] Querier / ruler: Change `-querier.max-fetched-chunks-per-query` configuration to limit to maximum number of chunks that can be fetched in a single query. The number of chunks fetched by ingesters AND long-term storare combined should not exceed the value configured on `-querier.max-fetched-chunks-per-query`. #4260
* [CHANGE] Memberlist: the `memberlist_kv_store_value_bytes` has been removed due to values no longer being stored in-memory as encoded bytes. #4345
* [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring. #4262
* [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler. #4341
* [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following: #4342
* `-distributor.ring.heartbeat-timeout`
* `-ring.heartbeat-timeout`
* `-ruler.ring.heartbeat-timeout`
* `-alertmanager.sharding-ring.heartbeat-timeout`
* `-compactor.ring.heartbeat-timeout`
* `-store-gateway.sharding-ring.heartbeat-timeout`
* [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options: #4344
* `-distributor.ring.heartbeat-period`
* `-ingester.heartbeat-period`
* `-ruler.ring.heartbeat-period`
* `-alertmanager.sharding-ring.heartbeat-period`
* `-compactor.ring.heartbeat-period`
* `-store-gateway.sharding-ring.heartbeat-period`
* [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters. #4345
* [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via `-memberlist.compression=enabled`. #4346
* [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn't update number of cluster per user correctly. #4336
* [FEATURE] Add shuffle sharding grouper and planner within compactor to allow further work towards parallelizing compaction #4318

## 1.10.0-rc.0 / 2021-06-28

Expand All @@ -17,6 +38,14 @@
* [CHANGE] Change default value of `-server.grpc.keepalive.min-time-between-pings` from `5m` to `10s` and `-server.grpc.keepalive.ping-without-stream-allowed` to `true`. #4168
* [CHANGE] Ingester: Change default value of `-ingester.active-series-metrics-enabled` to `true`. This incurs a small increase in memory usage, between 1.2% and 1.6% as measured on ingesters with 1.3M active series. #4257
* [CHANGE] Dependency: update go-redis from v8.2.3 to v8.9.0. #4236
* [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected: #4276
- `-memberlist.stream-timeout`
- `-memberlist.retransmit-factor`
- `-memberlist.pull-push-interval`
- `-memberlist.gossip-interval`
- `-memberlist.gossip-nodes`
- `-memberlist.gossip-to-dead-nodes-time`
- `-memberlist.dead-node-reclaim-time`
* [FEATURE] Querier: Added new `-querier.max-fetched-series-per-query` flag. When Cortex is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage). #4179
* [FEATURE] Querier/Ruler: Added new `-querier.max-fetched-chunk-bytes-per-query` flag. When Cortex is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query. #4216
* [FEATURE] Alertmanager: support negative matchers, time-based muting - [upstream release notes](https://github.com/prometheus/alertmanager/releases/tag/v0.22.0). #4237
Expand Down
3 changes: 3 additions & 0 deletions docs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ should read:
1. [Getting started with Cortex](getting-started/_index.md)
1. [Information regarding configuring Cortex](configuration/_index.md)

There are also individual [guides](guides/_index.md) to many tasks.
Please review the important [security advice](guides/security.md) before deploying.

For a guide to contributing to Cortex, see the [contributor guidelines](contributing/).

## Further reading
Expand Down
13 changes: 11 additions & 2 deletions docs/blocks-storage/compactor.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,11 @@ compactor:
# CLI flag: -compactor.sharding-enabled
[sharding_enabled: <boolean> | default = false]

# The sharding strategy to use. Supported values are: default,
# shuffle-sharding.
# CLI flag: -compactor.sharding-strategy
[sharding_strategy: <string> | default = "default"]

sharding_ring:
kvstore:
# Backend storage to use for the ring. Supported values are: consul, etcd,
Expand Down Expand Up @@ -209,12 +214,12 @@ compactor:
# CLI flag: -compactor.ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -compactor.ring.heartbeat-period
[heartbeat_period: <duration> | default = 5s]

# The heartbeat timeout after which compactors are considered unhealthy
# within the ring.
# within the ring. 0 = never (timeout disabled).
# CLI flag: -compactor.ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand All @@ -230,4 +235,8 @@ compactor:
# Name of network interface to read address from.
# CLI flag: -compactor.ring.instance-interface-names
[instance_interface_names: <list of string> | default = [eth0 en0]]

# Timeout for waiting on compactor to become ACTIVE in the ring.
# CLI flag: -compactor.ring.wait-active-instance-timeout
[wait_active_instance_timeout: <duration> | default = 10m]
```
6 changes: 3 additions & 3 deletions docs/blocks-storage/store-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,13 +232,13 @@ store_gateway:
# CLI flag: -store-gateway.sharding-ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -store-gateway.sharding-ring.heartbeat-period
[heartbeat_period: <duration> | default = 15s]

# The heartbeat timeout after which store gateways are considered unhealthy
# within the ring. This option needs be set both on the store-gateway and
# querier when running in microservices mode.
# within the ring. 0 = never (timeout disabled). This option needs be set
# both on the store-gateway and querier when running in microservices mode.
# CLI flag: -store-gateway.sharding-ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand Down
68 changes: 43 additions & 25 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -563,12 +563,12 @@ ring:
# CLI flag: -distributor.ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -distributor.ring.heartbeat-period
[heartbeat_period: <duration> | default = 5s]

# The heartbeat timeout after which distributors are considered unhealthy
# within the ring.
# within the ring. 0 = never (timeout disabled).
# CLI flag: -distributor.ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand Down Expand Up @@ -662,6 +662,7 @@ lifecycler:
[mirror_timeout: <duration> | default = 2s]

# The heartbeat timeout after which ingesters are skipped for reads/writes.
# 0 = never (timeout disabled).
# CLI flag: -ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand All @@ -678,7 +679,7 @@ lifecycler:
# CLI flag: -ingester.num-tokens
[num_tokens: <int> | default = 128]

# Period at which to heartbeat to consul.
# Period at which to heartbeat to consul. 0 = disabled.
# CLI flag: -ingester.heartbeat-period
[heartbeat_period: <duration> | default = 5s]

Expand Down Expand Up @@ -1580,12 +1581,12 @@ ring:
# CLI flag: -ruler.ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -ruler.ring.heartbeat-period
[heartbeat_period: <duration> | default = 5s]

# The heartbeat timeout after which rulers are considered unhealthy within the
# ring.
# ring. 0 = never (timeout disabled).
# CLI flag: -ruler.ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand Down Expand Up @@ -1616,6 +1617,11 @@ ring:
# processing will ignore them instead. Subject to sharding.
# CLI flag: -ruler.disabled-tenants
[disabled_tenants: <string> | default = ""]

# Report the wall time for ruler queries to complete as a per user metric and as
# an info level log message.
# CLI flag: -ruler.query-stats-enabled
[query_stats_enabled: <boolean> | default = false]
```

### `ruler_storage_config`
Expand Down Expand Up @@ -1901,12 +1907,12 @@ sharding_ring:
# CLI flag: -alertmanager.sharding-ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -alertmanager.sharding-ring.heartbeat-period
[heartbeat_period: <duration> | default = 15s]

# The heartbeat timeout after which alertmanagers are considered unhealthy
# within the ring.
# within the ring. 0 = never (timeout disabled).
# CLI flag: -alertmanager.sharding-ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand Down Expand Up @@ -3761,36 +3767,40 @@ The `memberlist_config` configures the Gossip memberlist.
[randomize_node_name: <boolean> | default = true]

# The timeout for establishing a connection with a remote node, and for
# read/write operations. Uses memberlist LAN defaults if 0.
# read/write operations.
# CLI flag: -memberlist.stream-timeout
[stream_timeout: <duration> | default = 0s]
[stream_timeout: <duration> | default = 10s]

# Multiplication factor used when sending out messages (factor * log(N+1)).
# CLI flag: -memberlist.retransmit-factor
[retransmit_factor: <int> | default = 0]
[retransmit_factor: <int> | default = 4]

# How often to use pull/push sync. Uses memberlist LAN defaults if 0.
# How often to use pull/push sync.
# CLI flag: -memberlist.pullpush-interval
[pull_push_interval: <duration> | default = 0s]
[pull_push_interval: <duration> | default = 30s]

# How often to gossip. Uses memberlist LAN defaults if 0.
# How often to gossip.
# CLI flag: -memberlist.gossip-interval
[gossip_interval: <duration> | default = 0s]
[gossip_interval: <duration> | default = 200ms]

# How many nodes to gossip to. Uses memberlist LAN defaults if 0.
# How many nodes to gossip to.
# CLI flag: -memberlist.gossip-nodes
[gossip_nodes: <int> | default = 0]
[gossip_nodes: <int> | default = 3]

# How long to keep gossiping to dead nodes, to give them chance to refute their
# death. Uses memberlist LAN defaults if 0.
# death.
# CLI flag: -memberlist.gossip-to-dead-nodes-time
[gossip_to_dead_nodes_time: <duration> | default = 0s]
[gossip_to_dead_nodes_time: <duration> | default = 30s]

# How soon can dead node's name be reclaimed with new address. Defaults to 0,
# which is disabled.
# How soon can dead node's name be reclaimed with new address. 0 to disable.
# CLI flag: -memberlist.dead-node-reclaim-time
[dead_node_reclaim_time: <duration> | default = 0s]

# Enable message compression. This can be used to reduce bandwidth usage at the
# cost of slightly more CPU utilization.
# CLI flag: -memberlist.compression-enabled
[compression_enabled: <boolean> | default = true]

# Other cluster members to join. Can be specified multiple times. It can be an
# IP, hostname or an entry specified in the DNS Service Discovery format (see
# https://cortexmetrics.io/docs/configuration/arguments/#dns-service-discovery
Expand Down Expand Up @@ -5138,6 +5148,10 @@ The `compactor_config` configures the compactor for the blocks storage.
# CLI flag: -compactor.sharding-enabled
[sharding_enabled: <boolean> | default = false]

# The sharding strategy to use. Supported values are: default, shuffle-sharding.
# CLI flag: -compactor.sharding-strategy
[sharding_strategy: <string> | default = "default"]

sharding_ring:
kvstore:
# Backend storage to use for the ring. Supported values are: consul, etcd,
Expand Down Expand Up @@ -5174,12 +5188,12 @@ sharding_ring:
# CLI flag: -compactor.ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -compactor.ring.heartbeat-period
[heartbeat_period: <duration> | default = 5s]

# The heartbeat timeout after which compactors are considered unhealthy within
# the ring.
# the ring. 0 = never (timeout disabled).
# CLI flag: -compactor.ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand All @@ -5195,6 +5209,10 @@ sharding_ring:
# Name of network interface to read address from.
# CLI flag: -compactor.ring.instance-interface-names
[instance_interface_names: <list of string> | default = [eth0 en0]]

# Timeout for waiting on compactor to become ACTIVE in the ring.
# CLI flag: -compactor.ring.wait-active-instance-timeout
[wait_active_instance_timeout: <duration> | default = 10m]
```

### `store_gateway_config`
Expand Down Expand Up @@ -5248,13 +5266,13 @@ sharding_ring:
# CLI flag: -store-gateway.sharding-ring.multi.mirror-timeout
[mirror_timeout: <duration> | default = 2s]

# Period at which to heartbeat to the ring.
# Period at which to heartbeat to the ring. 0 = disabled.
# CLI flag: -store-gateway.sharding-ring.heartbeat-period
[heartbeat_period: <duration> | default = 15s]

# The heartbeat timeout after which store gateways are considered unhealthy
# within the ring. This option needs be set both on the store-gateway and
# querier when running in microservices mode.
# within the ring. 0 = never (timeout disabled). This option needs be set both
# on the store-gateway and querier when running in microservices mode.
# CLI flag: -store-gateway.sharding-ring.heartbeat-timeout
[heartbeat_timeout: <duration> | default = 1m]

Expand Down
14 changes: 14 additions & 0 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,17 @@ Currently experimental features are:
- user config size (`-alertmanager.max-config-size-bytes`)
- templates count in user config (`-alertmanager.max-templates-count`)
- max template size (`-alertmanager.max-template-size-bytes`)
- Disabling ring heartbeat timeouts
- `-distributor.ring.heartbeat-timeout=0`
- `-ring.heartbeat-timeout=0`
- `-ruler.ring.heartbeat-timeout=0`
- `-alertmanager.sharding-ring.heartbeat-timeout=0`
- `-compactor.ring.heartbeat-timeout=0`
- `-store-gateway.sharding-ring.heartbeat-timeout=0`
- Disabling ring heartbeats
- `-distributor.ring.heartbeat-period=0`
- `-ingester.heartbeat-period=0`
- `-ruler.ring.heartbeat-period=0`
- `-alertmanager.sharding-ring.heartbeat-period=0`
- `-compactor.ring.heartbeat-period=0`
- `-store-gateway.sharding-ring.heartbeat-period=0`
12 changes: 12 additions & 0 deletions docs/guides/security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: "Security"
linkTitle: "Security"
weight: 10
slug: security
---

Cortex must be deployed with due care over system configuration, using principles such as "least privilege" to limit any exposure due to flaws in the source code.

You must configure authorisation and authentication externally to Cortex; see [this guide](./authentication-and-authorisation.md)

Information about security disclosures and mailing lists is [in the main repo](https://github.com/cortexproject/cortex/blob/master/SECURITY.md)
Loading