Skip to content

Commit 1344e61

Browse files
authored
Merge pull request #1553 from cortexproject/prod-guide
A start at a guide to running Cortex in production
2 parents fa2f141 + c182aef commit 1344e61

File tree

2 files changed

+224
-0
lines changed

2 files changed

+224
-0
lines changed

docs/auth.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
# Authentication and Authorisation
2+
3+
All Cortex components take the tenant ID from a header `X-Scope-OrgID`
4+
on each request. They trust this value completely: if you need to
5+
protect your Cortex installation from accidental or malicious calls
6+
then you must add an additional layer of protection.
7+
8+
Typically this means you run Cortex behind a reverse proxy, and ensure
9+
that all callers, both machines sending data over the remote_write
10+
interface and humans sending queries from GUIs, supply credentials
11+
which identify them and confirm they are authorised.
12+
13+
When configuring the remote_write API in Prometheus there is no way to
14+
add extra headers. The user and password fields of http Basic auth, or
15+
Bearer token, can be used to convey tenant ID and/or credentials.

docs/running.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Running Cortex in Production
2+
3+
This document assumes you have read the
4+
[architecture](architecture.md) document.
5+
6+
## Planning
7+
8+
### Tenants
9+
10+
If you will run Cortex as a multi-tenant system, you need to give each
11+
tenant a unique ID - this can be any string. Managing tenants and
12+
allocating IDs must be done outside of Cortex. You must also configure
13+
[Authentication and Authorisation](auth.md).
14+
15+
### Storage
16+
17+
Cortex requires a scalable storage back-end. Commercial cloud options
18+
are DynamoDB and Bigtable: the advantage is you don't have to know how
19+
to manage them, but the downside is they have specific costs.
20+
Alternatively you can choose Cassandra, which you will have to install
21+
and manage.
22+
23+
### Components
24+
25+
Every Cortex installation will need Distributor, Ingester and Querier.
26+
Alertmanager, Ruler and Query-frontend are optional.
27+
28+
### Other dependencies
29+
30+
Cortex needs a KV store to track sharding of data between
31+
processes. This can be either Etcd or Consul.
32+
33+
If you want to configure recording and alerting rules (i.e. if you
34+
will run the Ruler and Alertmanager components) then a Postgres
35+
database is required to store configs.
36+
37+
Memcached is not essential but highly recommended.
38+
39+
### Ingester replication factor
40+
41+
The standard replication factor is three, so that we can drop one
42+
replica and be unconcerned, as we still have two copies of the data
43+
left for redundancy. This is configurable: you can run with more
44+
redundancy or less, depending on your risk appetite.
45+
46+
### Index schema
47+
48+
Choose schema version 9 in most cases; version 10 if you expect
49+
hundreds of thousands of timeseries under a single name. Anything
50+
older than v9 is much less efficient.
51+
52+
### Chunk encoding
53+
54+
Standard choice would be Bigchunk, which is the most flexible chunk
55+
encoding. You may get better compression from Varbit, if many of your
56+
timeseries do not change value from one day to the next.
57+
58+
### Sizing
59+
60+
You will want to estimate how many nodes are required, how many of
61+
each component to run, and how much storage space will be required.
62+
In practice, these will vary greatly depending on the metrics being
63+
sent to Cortex.
64+
65+
Some key parameters are:
66+
67+
1. The number of active series. If you have Prometheus already you
68+
can query `prometheus_tsdb_head_series` to see this number.
69+
2. Sampling rate, e.g. a new sample for each series every 15
70+
seconds. Multiply this by the number of active series to get the
71+
total rate at which samples will arrive at Cortex.
72+
3. The rate at which series are added and removed. This can be very
73+
high if you monitor objects that come and go - for example if you run
74+
thousands of batch jobs lasting a minute or so and capture metrics
75+
with a unique ID for each one. [Read how to analyse this on
76+
Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality)
77+
4. How compressible the time-series data are. If a metric stays at
78+
the same value constantly, then Cortex can compress it very well, so
79+
12 hours of data sampled every 15 seconds would be around 2KB. On
80+
the other hand if the value jumps around a lot it might take 10KB.
81+
There are not currently any tools available to analyse this.
82+
5. How long you want to retain data for, e.g. 1 month or 2 years.
83+
84+
Other parameters which can become important if you have particularly
85+
high values:
86+
87+
6. Number of different series under one metric name.
88+
7. Number of labels per series.
89+
8. Rate and complexity of queries.
90+
91+
Now, some rules of thumb:
92+
93+
1. Each million series in an ingester takes 15GB of RAM. Total number
94+
of series in ingesters is number of active series times the
95+
replication factor. This is with the default of 12-hour chunks - RAM
96+
required will reduce if you set `-ingester.max-chunk-age` lower
97+
(trading off more back-end database IO)
98+
2. Each million series (including churn) consumes 15GB of chunk
99+
storage and 4GB of index, per day (so multiply by the retention
100+
period).
101+
3. Each 100,000 samples/sec arriving takes 1 CPU in distributors.
102+
Distributors don't need much RAM.
103+
104+
If you turn on compression between distributors and ingesters (for
105+
example to save on inter-zone bandwidth charges at AWS) they will use
106+
significantly more CPU (approx 100% more for distributor and 50% more
107+
for ingester).
108+
109+
### Caching
110+
111+
Cortex can retain data in-process or in Memcached to speed up various
112+
queries by caching:
113+
114+
* individual chunks
115+
* index lookups for one label on one day
116+
* the results of a whole query
117+
118+
You should always include Memcached in your Cortex install so results
119+
from one process can be re-used by another. In-process caching can cut
120+
fetch times slightly and reduce the load on Memcached.
121+
122+
Ingesters can also be configured to use Memcached to avoid re-writing
123+
index and chunk data which has already been stored in the back-end
124+
database. Again, highly recommended.
125+
126+
### Orchestration
127+
128+
Because Cortex is designed to run multiple instances of each component
129+
(ingester, querier, etc.), you probably want to automate the placement
130+
and shepherding of these instances. Most users choose Kubernetes to do
131+
this, but this is not mandatory.
132+
133+
## Configuration
134+
135+
### Resource requests
136+
137+
If using Kubernetes, each container should specify resource requests
138+
so that the scheduler can place them on a node with sufficient capacity.
139+
140+
For example an ingester might request:
141+
142+
```
143+
resources:
144+
requests:
145+
cpu: 4
146+
memory: 10Gi
147+
```
148+
149+
The specific values here should be adjusted based on your own
150+
experiences running Cortex - they are very dependent on rate of data
151+
arriving and other factors such as series churn.
152+
153+
### Take extra care with ingesters
154+
155+
Ingesters hold hours of timeseries data in memory; you can configure
156+
Cortex to replicate the data but you should take steps to avoid losing
157+
all replicas at once:
158+
- Don't run multiple ingesters on the same node.
159+
- Don't run ingesters on preemptible/spot nodes.
160+
- Spread out ingesters across racks / availability zones / whatever
161+
applies in your datacenters.
162+
163+
You can ask Kubernetes to avoid running on the same node like this:
164+
165+
```
166+
affinity:
167+
podAntiAffinity:
168+
preferredDuringSchedulingIgnoredDuringExecution:
169+
- weight: 100
170+
podAffinityTerm:
171+
labelSelector:
172+
matchExpressions:
173+
- key: name
174+
operator: In
175+
values:
176+
- ingester
177+
topologyKey: "kubernetes.io/hostname"
178+
```
179+
180+
Give plenty of time for an ingester to hand over or flush data to
181+
store when shutting down; for Kubernetes this looks like:
182+
183+
```
184+
terminationGracePeriodSeconds: 2400
185+
```
186+
187+
Ask Kubernetes to limit rolling updates to one ingester at a time, and
188+
signal the old one to stop before the new one is ready:
189+
190+
```
191+
strategy:
192+
rollingUpdate:
193+
maxSurge: 0
194+
maxUnavailable: 1
195+
```
196+
197+
Ingesters provide an http hook to signal readiness when all is well;
198+
this is valuable because it stops a rolling update at the first
199+
problem:
200+
201+
```
202+
readinessProbe:
203+
httpGet:
204+
path: /ready
205+
port: 80
206+
```
207+
208+
We do not recommend configuring a liveness probe on ingesters -
209+
killing them is a last resort and should not be left to a machine.

0 commit comments

Comments
 (0)