Merge pull request #1553 from cortexproject/prod-guide

bboreham · web-flow · commit 1344e615fdd6 · 2019-08-22T16:26:35.000+01:00
A start at a guide to running Cortex in production
diff --git a/docs/auth.md b/docs/auth.md
@@ -0,0 +1,15 @@
+# Authentication and Authorisation
+
+All Cortex components take the tenant ID from a header `X-Scope-OrgID`
+on each request. They trust this value completely: if you need to
+protect your Cortex installation from accidental or malicious calls
+then you must add an additional layer of protection.
+
+Typically this means you run Cortex behind a reverse proxy, and ensure
+that all callers, both machines sending data over the remote_write
+interface and humans sending queries from GUIs, supply credentials
+which identify them and confirm they are authorised.
+
+When configuring the remote_write API in Prometheus there is no way to
+add extra headers. The user and password fields of http Basic auth, or
+Bearer token, can be used to convey tenant ID and/or credentials.
diff --git a/docs/running.md b/docs/running.md
@@ -0,0 +1,209 @@
+# Running Cortex in Production
+
+This document assumes you have read the
+[architecture](architecture.md) document.
+
+## Planning
+
+### Tenants
+
+If you will run Cortex as a multi-tenant system, you need to give each
+tenant a unique ID - this can be any string. Managing tenants and
+allocating IDs must be done outside of Cortex. You must also configure
+[Authentication and Authorisation](auth.md).
+
+### Storage
+
+Cortex requires a scalable storage back-end.  Commercial cloud options
+are DynamoDB and Bigtable: the advantage is you don't have to know how
+to manage them, but the downside is they have specific costs.
+Alternatively you can choose Cassandra, which you will have to install
+and manage.
+
+### Components
+
+Every Cortex installation will need Distributor, Ingester and Querier.
+Alertmanager, Ruler and Query-frontend are optional.
+
+### Other dependencies
+
+Cortex needs a KV store to track sharding of data between
+processes. This can be either Etcd or Consul.
+
+If you want to configure recording and alerting rules (i.e. if you
+will run the Ruler and Alertmanager components) then a Postgres
+database is required to store configs.
+
+Memcached is not essential but highly recommended.
+
+### Ingester replication factor
+
+The standard replication factor is three, so that we can drop one
+replica and be unconcerned, as we still have two copies of the data
+left for redundancy. This is configurable: you can run with more
+redundancy or less, depending on your risk appetite.
+
+### Index schema
+
+Choose schema version 9 in most cases; version 10 if you expect
+hundreds of thousands of timeseries under a single name.  Anything
+older than v9 is much less efficient.
+
+### Chunk encoding
+
+Standard choice would be Bigchunk, which is the most flexible chunk
+encoding. You may get better compression from Varbit, if many of your
+timeseries do not change value from one day to the next.
+
+### Sizing
+
+You will want to estimate how many nodes are required, how many of
+each component to run, and how much storage space will be required.
+In practice, these will vary greatly depending on the metrics being
+sent to Cortex.
+
+Some key parameters are:
+
+ 1. The number of active series. If you have Prometheus already you
+ can query `prometheus_tsdb_head_series` to see this number.
+ 2. Sampling rate, e.g. a new sample for each series every 15
+ seconds. Multiply this by the number of active series to get the
+ total rate at which samples will arrive at Cortex.
+ 3. The rate at which series are added and removed. This can be very
+ high if you monitor objects that come and go - for example if you run
+ thousands of batch jobs lasting a minute or so and capture metrics
+ with a unique ID for each one. [Read how to analyse this on
+ Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality)
+ 4. How compressible the time-series data are. If a metric stays at
+ the same value constantly, then Cortex can compress it very well, so
+ 12 hours of data sampled every 15 seconds would be around 2KB.  On
+ the other hand if the value jumps around a lot it might take 10KB.
+ There are not currently any tools available to analyse this.
+ 5. How long you want to retain data for, e.g. 1 month or 2 years.
+
+Other parameters which can become important if you have particularly
+high values:
+
+ 6. Number of different series under one metric name.
+ 7. Number of labels per series.
+ 8. Rate and complexity of queries.
+
+Now, some rules of thumb:
+
+ 1. Each million series in an ingester takes 15GB of RAM. Total number
+ of series in ingesters is number of active series times the
+ replication factor. This is with the default of 12-hour chunks - RAM
+ required will reduce if you set `-ingester.max-chunk-age` lower
+ (trading off more back-end database IO)
+ 2. Each million series (including churn) consumes 15GB of chunk
+ storage and 4GB of index, per day (so multiply by the retention
+ period).
+ 3. Each 100,000 samples/sec arriving takes 1 CPU in distributors.
+ Distributors don't need much RAM.
+
+If you turn on compression between distributors and ingesters (for
+example to save on inter-zone bandwidth charges at AWS) they will use
+significantly more CPU (approx 100% more for distributor and 50% more
+for ingester).
+
+### Caching
+
+Cortex can retain data in-process or in Memcached to speed up various
+queries by caching:
+
+ * individual chunks
+ * index lookups for one label on one day
+ * the results of a whole query
+
+You should always include Memcached in your Cortex install so results
+from one process can be re-used by another. In-process caching can cut
+fetch times slightly and reduce the load on Memcached.
+
+Ingesters can also be configured to use Memcached to avoid re-writing
+index and chunk data which has already been stored in the back-end
+database. Again, highly recommended.
+
+### Orchestration
+
+Because Cortex is designed to run multiple instances of each component
+(ingester, querier, etc.), you probably want to automate the placement
+and shepherding of these instances. Most users choose Kubernetes to do
+this, but this is not mandatory.
+
+## Configuration
+
+### Resource requests
+
+If using Kubernetes, each container should specify resource requests
+so that the scheduler can place them on a node with sufficient capacity.
+
+For example an ingester might request:
+
+```
+        resources:
+          requests:
+            cpu: 4
+            memory: 10Gi
+```
+
+The specific values here should be adjusted based on your own
+experiences running Cortex - they are very dependent on rate of data
+arriving and other factors such as series churn.
+
+### Take extra care with ingesters
+
+Ingesters hold hours of timeseries data in memory; you can configure
+Cortex to replicate the data but you should take steps to avoid losing
+all replicas at once:
+ - Don't run multiple ingesters on the same node.
+ - Don't run ingesters on preemptible/spot nodes.
+ - Spread out ingesters across racks / availability zones / whatever
+   applies in your datacenters.
+
+You can ask Kubernetes to avoid running on the same node like this:
+
+```
+      affinity:
+        podAntiAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 100
+            podAffinityTerm:
+              labelSelector:
+                matchExpressions:
+                - key: name
+                  operator: In
+                  values:
+                  - ingester
+              topologyKey: "kubernetes.io/hostname"
+```
+
+Give plenty of time for an ingester to hand over or flush data to
+store when shutting down; for Kubernetes this looks like:
+
+```
+      terminationGracePeriodSeconds: 2400
+```
+
+Ask Kubernetes to limit rolling updates to one ingester at a time, and
+signal the old one to stop before the new one is ready:
+
+```
+  strategy:
+    rollingUpdate:
+      maxSurge: 0
+      maxUnavailable: 1
+```
+
+Ingesters provide an http hook to signal readiness when all is well;
+this is valuable because it stops a rolling update at the first
+problem:
+
+```
+        readinessProbe:
+          httpGet:
+            path: /ready
+            port: 80
+```
+
+We do not recommend configuring a liveness probe on ingesters -
+killing them is a last resort and should not be left to a machine.