ac1214 · ac1214 · Jul 5, 2021 · Jul 6, 2021 · Jul 7, 2021 · Jul 9, 2021
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,9 +1,30 @@
 # Changelog
 
 ## master / unreleased
+* [FEATURE] Ruler: Add new `-ruler.query-stats-enabled` which when enabled will report the `cortex_ruler_query_seconds_total` as a per-user metric that tracks the sum of the wall time of executing queries in the ruler in seconds. #4317
 
 * [CHANGE] Querier / ruler: Change `-querier.max-fetched-chunks-per-query` configuration to limit to maximum number of chunks that can be fetched in a single query. The number of chunks fetched by ingesters AND long-term storare combined should not exceed the value configured on `-querier.max-fetched-chunks-per-query`. #4260
+* [CHANGE] Memberlist: the `memberlist_kv_store_value_bytes` has been removed due to values no longer being stored in-memory as encoded bytes. #4345
+* [ENHANCEMENT] Add timeout for waiting on compactor to become ACTIVE in the ring. #4262
+* [ENHANCEMENT] Reduce memory used by streaming queries, particularly in ruler. #4341
+* [ENHANCEMENT] Ring: allow experimental configuration of disabling of heartbeat timeouts by setting the relevant configuration value to zero. Applies to the following: #4342
+  * `-distributor.ring.heartbeat-timeout`
+  * `-ring.heartbeat-timeout`
+  * `-ruler.ring.heartbeat-timeout`
+  * `-alertmanager.sharding-ring.heartbeat-timeout`
+  * `-compactor.ring.heartbeat-timeout`
+  * `-store-gateway.sharding-ring.heartbeat-timeout`
+* [ENHANCEMENT] Ring: allow heartbeats to be explicitly disabled by setting the interval to zero. This is considered experimental. This applies to the following configuration options: #4344
+  * `-distributor.ring.heartbeat-period`
+  * `-ingester.heartbeat-period`
+  * `-ruler.ring.heartbeat-period`
+  * `-alertmanager.sharding-ring.heartbeat-period`
+  * `-compactor.ring.heartbeat-period`
+  * `-store-gateway.sharding-ring.heartbeat-period`
+* [ENHANCEMENT] Memberlist: optimized receive path for processing ring state updates, to help reduce CPU utilization in large clusters. #4345
+* [ENHANCEMENT] Memberlist: expose configuration of memberlist packet compression via `-memberlist.compression=enabled`. #4346
 * [BUGFIX] HA Tracker: when cleaning up obsolete elected replicas from KV store, tracker didn't update number of cluster per user correctly. #4336
+* [FEATURE] Add shuffle sharding grouper and planner within compactor to allow further work towards parallelizing compaction #4318
 
 ## 1.10.0-rc.0 / 2021-06-28
 
@@ -17,6 +38,14 @@
 * [CHANGE] Change default value of `-server.grpc.keepalive.min-time-between-pings` from `5m` to `10s` and `-server.grpc.keepalive.ping-without-stream-allowed` to `true`. #4168
 * [CHANGE] Ingester: Change default value of `-ingester.active-series-metrics-enabled` to `true`. This incurs a small increase in memory usage, between 1.2% and 1.6% as measured on ingesters with 1.3M active series. #4257
 * [CHANGE] Dependency: update go-redis from v8.2.3 to v8.9.0. #4236
+* [CHANGE] Memberlist: Expose default configuration values to the command line options. Note that setting these explicitly to zero will no longer cause the default to be used. If the default is desired, then do set the option. The following are affected: #4276
+  - `-memberlist.stream-timeout`
+  - `-memberlist.retransmit-factor`
+  - `-memberlist.pull-push-interval`
+  - `-memberlist.gossip-interval`
+  - `-memberlist.gossip-nodes`
+  - `-memberlist.gossip-to-dead-nodes-time`
+  - `-memberlist.dead-node-reclaim-time`
 * [FEATURE] Querier: Added new `-querier.max-fetched-series-per-query` flag. When Cortex is running with blocks storage, the max series per query limit is enforced in the querier and applies to unique series received from ingesters and store-gateway (long-term storage). #4179
 * [FEATURE] Querier/Ruler: Added new `-querier.max-fetched-chunk-bytes-per-query` flag. When Cortex is running with blocks storage, the max chunk bytes limit is enforced in the querier and ruler and limits the size of all aggregated chunks returned from ingesters and storage as bytes for a query. #4216
 * [FEATURE] Alertmanager: support negative matchers, time-based muting - [upstream release notes](https://github.com/prometheus/alertmanager/releases/tag/v0.22.0). #4237

diff --git a/docs/_index.md b/docs/_index.md
@@ -36,6 +36,9 @@ should read:
 1. [Getting started with Cortex](getting-started/_index.md)
 1. [Information regarding configuring Cortex](configuration/_index.md)
 
+There are also individual [guides](guides/_index.md) to many tasks.
+Please review the important [security advice](guides/security.md) before deploying.
+
 For a guide to contributing to Cortex, see the [contributor guidelines](contributing/).
 
 ## Further reading

diff --git a/docs/blocks-storage/compactor.md b/docs/blocks-storage/compactor.md
@@ -173,6 +173,11 @@ compactor:
   # CLI flag: -compactor.sharding-enabled
   [sharding_enabled: <boolean> | default = false]
 
+  # The sharding strategy to use. Supported values are: default,
+  # shuffle-sharding.
+  # CLI flag: -compactor.sharding-strategy
+  [sharding_strategy: <string> | default = "default"]
+
   sharding_ring:
     kvstore:
       # Backend storage to use for the ring. Supported values are: consul, etcd,
@@ -209,12 +214,12 @@ compactor:
         # CLI flag: -compactor.ring.multi.mirror-timeout
         [mirror_timeout: <duration> | default = 2s]
 
-    # Period at which to heartbeat to the ring.
+    # Period at which to heartbeat to the ring. 0 = disabled.
     # CLI flag: -compactor.ring.heartbeat-period
     [heartbeat_period: <duration> | default = 5s]
 
     # The heartbeat timeout after which compactors are considered unhealthy
-    # within the ring.
+    # within the ring. 0 = never (timeout disabled).
     # CLI flag: -compactor.ring.heartbeat-timeout
     [heartbeat_timeout: <duration> | default = 1m]
 
@@ -230,4 +235,8 @@ compactor:
     # Name of network interface to read address from.
     # CLI flag: -compactor.ring.instance-interface-names
     [instance_interface_names: <list of string> | default = [eth0 en0]]
+
+    # Timeout for waiting on compactor to become ACTIVE in the ring.
+    # CLI flag: -compactor.ring.wait-active-instance-timeout
+    [wait_active_instance_timeout: <duration> | default = 10m]
 ```
diff --git a/docs/blocks-storage/store-gateway.md b/docs/blocks-storage/store-gateway.md
@@ -232,13 +232,13 @@ store_gateway:
         # CLI flag: -store-gateway.sharding-ring.multi.mirror-timeout
         [mirror_timeout: <duration> | default = 2s]
 
-    # Period at which to heartbeat to the ring.
+    # Period at which to heartbeat to the ring. 0 = disabled.
     # CLI flag: -store-gateway.sharding-ring.heartbeat-period
     [heartbeat_period: <duration> | default = 15s]
 
     # The heartbeat timeout after which store gateways are considered unhealthy
-    # within the ring. This option needs be set both on the store-gateway and
-    # querier when running in microservices mode.
+    # within the ring. 0 = never (timeout disabled). This option needs be set
+    # both on the store-gateway and querier when running in microservices mode.
     # CLI flag: -store-gateway.sharding-ring.heartbeat-timeout
     [heartbeat_timeout: <duration> | default = 1m]
 

diff --git a/docs/configuration/config-file-reference.md b/docs/configuration/config-file-reference.md
@@ -563,12 +563,12 @@ ring:
       # CLI flag: -distributor.ring.multi.mirror-timeout
       [mirror_timeout: <duration> | default = 2s]
 
-  # Period at which to heartbeat to the ring.
+  # Period at which to heartbeat to the ring. 0 = disabled.
   # CLI flag: -distributor.ring.heartbeat-period
   [heartbeat_period: <duration> | default = 5s]
 
   # The heartbeat timeout after which distributors are considered unhealthy
-  # within the ring.
+  # within the ring. 0 = never (timeout disabled).
   # CLI flag: -distributor.ring.heartbeat-timeout
   [heartbeat_timeout: <duration> | default = 1m]
 
@@ -662,6 +662,7 @@ lifecycler:
         [mirror_timeout: <duration> | default = 2s]
 
     # The heartbeat timeout after which ingesters are skipped for reads/writes.
+    # 0 = never (timeout disabled).
     # CLI flag: -ring.heartbeat-timeout
     [heartbeat_timeout: <duration> | default = 1m]
 
@@ -678,7 +679,7 @@ lifecycler:
   # CLI flag: -ingester.num-tokens
   [num_tokens: <int> | default = 128]
 
-  # Period at which to heartbeat to consul.
+  # Period at which to heartbeat to consul. 0 = disabled.
   # CLI flag: -ingester.heartbeat-period
   [heartbeat_period: <duration> | default = 5s]
 
@@ -1580,12 +1581,12 @@ ring:
       # CLI flag: -ruler.ring.multi.mirror-timeout
       [mirror_timeout: <duration> | default = 2s]
 
-  # Period at which to heartbeat to the ring.
+  # Period at which to heartbeat to the ring. 0 = disabled.
   # CLI flag: -ruler.ring.heartbeat-period
   [heartbeat_period: <duration> | default = 5s]
 
   # The heartbeat timeout after which rulers are considered unhealthy within the
-  # ring.
+  # ring. 0 = never (timeout disabled).
   # CLI flag: -ruler.ring.heartbeat-timeout
   [heartbeat_timeout: <duration> | default = 1m]
 
@@ -1616,6 +1617,11 @@ ring:
 # processing will ignore them instead. Subject to sharding.
 # CLI flag: -ruler.disabled-tenants
 [disabled_tenants: <string> | default = ""]
+
+# Report the wall time for ruler queries to complete as a per user metric and as
+# an info level log message.
+# CLI flag: -ruler.query-stats-enabled
+[query_stats_enabled: <boolean> | default = false]
 ```
 
 ### `ruler_storage_config`
@@ -1901,12 +1907,12 @@ sharding_ring:
       # CLI flag: -alertmanager.sharding-ring.multi.mirror-timeout
       [mirror_timeout: <duration> | default = 2s]
 
-  # Period at which to heartbeat to the ring.
+  # Period at which to heartbeat to the ring. 0 = disabled.
   # CLI flag: -alertmanager.sharding-ring.heartbeat-period
   [heartbeat_period: <duration> | default = 15s]
 
   # The heartbeat timeout after which alertmanagers are considered unhealthy
-  # within the ring.
+  # within the ring. 0 = never (timeout disabled).
   # CLI flag: -alertmanager.sharding-ring.heartbeat-timeout
   [heartbeat_timeout: <duration> | default = 1m]
 
@@ -3761,36 +3767,40 @@ The `memberlist_config` configures the Gossip memberlist.
 [randomize_node_name: <boolean> | default = true]
 
 # The timeout for establishing a connection with a remote node, and for
-# read/write operations. Uses memberlist LAN defaults if 0.
+# read/write operations.
 # CLI flag: -memberlist.stream-timeout
-[stream_timeout: <duration> | default = 0s]
+[stream_timeout: <duration> | default = 10s]
 
 # Multiplication factor used when sending out messages (factor * log(N+1)).
 # CLI flag: -memberlist.retransmit-factor
-[retransmit_factor: <int> | default = 0]
+[retransmit_factor: <int> | default = 4]
 
-# How often to use pull/push sync. Uses memberlist LAN defaults if 0.
+# How often to use pull/push sync.
 # CLI flag: -memberlist.pullpush-interval
-[pull_push_interval: <duration> | default = 0s]
+[pull_push_interval: <duration> | default = 30s]
 
-# How often to gossip. Uses memberlist LAN defaults if 0.
+# How often to gossip.
 # CLI flag: -memberlist.gossip-interval
-[gossip_interval: <duration> | default = 0s]
+[gossip_interval: <duration> | default = 200ms]
 
-# How many nodes to gossip to. Uses memberlist LAN defaults if 0.
+# How many nodes to gossip to.
 # CLI flag: -memberlist.gossip-nodes
-[gossip_nodes: <int> | default = 0]
+[gossip_nodes: <int> | default = 3]
 
 # How long to keep gossiping to dead nodes, to give them chance to refute their
-# death. Uses memberlist LAN defaults if 0.
+# death.
 # CLI flag: -memberlist.gossip-to-dead-nodes-time
-[gossip_to_dead_nodes_time: <duration> | default = 0s]
+[gossip_to_dead_nodes_time: <duration> | default = 30s]
 
-# How soon can dead node's name be reclaimed with new address. Defaults to 0,
-# which is disabled.
+# How soon can dead node's name be reclaimed with new address. 0 to disable.
 # CLI flag: -memberlist.dead-node-reclaim-time
 [dead_node_reclaim_time: <duration> | default = 0s]
 
+# Enable message compression. This can be used to reduce bandwidth usage at the
+# cost of slightly more CPU utilization.
+# CLI flag: -memberlist.compression-enabled
+[compression_enabled: <boolean> | default = true]
+
 # Other cluster members to join. Can be specified multiple times. It can be an
 # IP, hostname or an entry specified in the DNS Service Discovery format (see
 # https://cortexmetrics.io/docs/configuration/arguments/#dns-service-discovery
@@ -5138,6 +5148,10 @@ The `compactor_config` configures the compactor for the blocks storage.
 # CLI flag: -compactor.sharding-enabled
 [sharding_enabled: <boolean> | default = false]
 
+# The sharding strategy to use. Supported values are: default, shuffle-sharding.
+# CLI flag: -compactor.sharding-strategy
+[sharding_strategy: <string> | default = "default"]
+
 sharding_ring:
   kvstore:
     # Backend storage to use for the ring. Supported values are: consul, etcd,
@@ -5174,12 +5188,12 @@ sharding_ring:
       # CLI flag: -compactor.ring.multi.mirror-timeout
       [mirror_timeout: <duration> | default = 2s]
 
-  # Period at which to heartbeat to the ring.
+  # Period at which to heartbeat to the ring. 0 = disabled.
   # CLI flag: -compactor.ring.heartbeat-period
   [heartbeat_period: <duration> | default = 5s]
 
   # The heartbeat timeout after which compactors are considered unhealthy within
-  # the ring.
+  # the ring. 0 = never (timeout disabled).
   # CLI flag: -compactor.ring.heartbeat-timeout
   [heartbeat_timeout: <duration> | default = 1m]
 
@@ -5195,6 +5209,10 @@ sharding_ring:
   # Name of network interface to read address from.
   # CLI flag: -compactor.ring.instance-interface-names
   [instance_interface_names: <list of string> | default = [eth0 en0]]
+
+  # Timeout for waiting on compactor to become ACTIVE in the ring.
+  # CLI flag: -compactor.ring.wait-active-instance-timeout
+  [wait_active_instance_timeout: <duration> | default = 10m]
 ```
 
 ### `store_gateway_config`
@@ -5248,13 +5266,13 @@ sharding_ring:
       # CLI flag: -store-gateway.sharding-ring.multi.mirror-timeout
       [mirror_timeout: <duration> | default = 2s]
 
-  # Period at which to heartbeat to the ring.
+  # Period at which to heartbeat to the ring. 0 = disabled.
   # CLI flag: -store-gateway.sharding-ring.heartbeat-period
   [heartbeat_period: <duration> | default = 15s]
 
   # The heartbeat timeout after which store gateways are considered unhealthy
-  # within the ring. This option needs be set both on the store-gateway and
-  # querier when running in microservices mode.
+  # within the ring. 0 = never (timeout disabled). This option needs be set both
+  # on the store-gateway and querier when running in microservices mode.
   # CLI flag: -store-gateway.sharding-ring.heartbeat-timeout
   [heartbeat_timeout: <duration> | default = 1m]
 

diff --git a/docs/configuration/v1-guarantees.md b/docs/configuration/v1-guarantees.md
@@ -81,3 +81,17 @@ Currently experimental features are:
   - user config size (`-alertmanager.max-config-size-bytes`)
   - templates count in user config (`-alertmanager.max-templates-count`)
   - max template size (`-alertmanager.max-template-size-bytes`)
+- Disabling ring heartbeat timeouts
+  - `-distributor.ring.heartbeat-timeout=0`
+  - `-ring.heartbeat-timeout=0`
+  - `-ruler.ring.heartbeat-timeout=0`
+  - `-alertmanager.sharding-ring.heartbeat-timeout=0`
+  - `-compactor.ring.heartbeat-timeout=0`
+  - `-store-gateway.sharding-ring.heartbeat-timeout=0`
+- Disabling ring heartbeats
+  - `-distributor.ring.heartbeat-period=0`
+  - `-ingester.heartbeat-period=0`
+  - `-ruler.ring.heartbeat-period=0`
+  - `-alertmanager.sharding-ring.heartbeat-period=0`
+  - `-compactor.ring.heartbeat-period=0`
+  - `-store-gateway.sharding-ring.heartbeat-period=0`
diff --git a/docs/guides/security.md b/docs/guides/security.md
@@ -0,0 +1,12 @@
+---
+title: "Security"
+linkTitle: "Security"
+weight: 10
+slug: security
+---
+
+Cortex must be deployed with due care over system configuration, using principles such as "least privilege" to limit any exposure due to flaws in the source code.
+
+You must configure authorisation and authentication externally to Cortex; see [this guide](./authentication-and-authorisation.md)
+
+Information about security disclosures and mailing lists is [in the main repo](https://github.com/cortexproject/cortex/blob/master/SECURITY.md)