Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions docs/running.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Running Cortex in Production

This document assumes you have read the
[architecture](architecture.md) document.

## Planning

### Tenants

If you will run Cortex as a multi-tenant system, you need to give each
tenant a unique ID - this can be any string. Managing tenants and
allocating IDs must be done outside of Cortex. You must also configure
[Authentication and Authorisation](auth.md).

### Storage

Cortex requires a scalable storage back-end. Commercial cloud options
are DynamoDB and Bigtable: the advantage is you don't have to know how
to manage them, but the downside is they have specific costs.
Alternatively you can choose Cassandra, which you will have to install
and manage.

### Components

Every Cortex installation will need Distributor, Ingester and Querier.
Alert-manager, Ruler and Query-frontend are optional.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Alertmanager is usually one word, happens in the next section as well.


### Other dependencies

Cortex needs a KV store to track sharding of data between
processes. This can be either Etcd or Consul.

If you want to configure recording and alerting rules (i.e. if you
will run the Ruler and Alert-manager components) then a Postgres
database is required to store configs.

Memcached is not essential but highly recommended.

### Ingester replication factor

The standard replication factor is three, so that we can drop one
sample and be unconcerned, as we still have two copies of the data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean 'replica' here instead of 'sample'?

left for redundancy. This is configurable: you can run with more
redundancy or less, depending on your risk appetite.

### Index schema

Choose schema version 9 in most cases; version 10 if you expect
hundreds of thousands of timeseries under a single name. Anything
older than v9 is much less efficient.

### Chunk encoding

Standard choice would be Bigchunk, which is the most flexible chunk
encoding. You may get better compression from Varbit, if many of your
timeseries do not change value from one day to the next.

### Sizing

You will want to estimate how many nodes are required, how many of
each component to run, and how much storage space will be required.
In practice, these will vary greatly depending on the metrics being
sent to Cortex.

Some key parameters are:

1. The number of active series. If you have Prometheus already you
can query `prometheus_tsdb_head_series` to see this number.
2. Sampling rate, e.g. a new sample for each series every 15
seconds. Multiply this by the number of active series to get the
total rate at which samples will arrive at Cortex.
3. The rate at which series are added and removed. This can be very
high if you monitor objects that come and go - for example if you run
thousands of batch jobs lasting a minute or so and capture metrics
with a unique ID for each one. [Read how to analyse this on
Prometheus](https://www.robustperception.io/using-tsdb-analyze-to-investigate-churn-and-cardinality)
4. How compressible the time-series data are. If a metric stays at
the same value constantly, then Cortex can compress it very well, so
12 hours of data sampled every 15 seconds would be around 2KB. On
the other hand if the value jumps around a lot it might take 10KB.
There are not currently any tools available to analyse this.
5. How long you want to retain data for, e.g. 1 month or 2 years.

Other parameters which can become important if you have particularly
high values:

6. Number of different series under one metric name.
7. Number of labels per series.
8. Rate and complexity of queries.

Now, some rules of thumb:

1. Each million series in an ingester takes 15GB of RAM. Total number
of series in ingesters is number of active series times the
replication factor. This is with the default of 12-hour chunks - RAM
required will reduce if you set `-ingester.max-chunk-age` lower
(trading off more back-end database IO)
2. Each million series (including churn) consumes 15GB of chunk
storage and 4GB of index, per day (so multiply by the retention
period).
3. Each 100,000 samples/sec arriving takes 1 CPU in distributors.
Distributors don't need much RAM.

If you turn on compression between distributors and ingesters (for
example to save on inter-zone bandwidth charges at AWS) they will use
significantly more CPU (approx 100% more for distributor and 50% more
for ingester).

### Caching

Cortex can retain data in-process or in Memcached to speed up various
queries by caching:

* individual chunks
* index lookups for one label on one day
* the results of a whole query

You should always include Memcached in your Cortex install so results
from one process can be re-used by another. In-process caching can cut
fetch times slightly and reduce the load on Memcached.

Ingesters can also be configured to use Memcached to avoid re-writing
index and chunk data which has already been stored in the back-end
database. Again, highly recommended.

### Orchestration

Because Cortex is designed to run multiple instances of each component
(ingester, querier, etc.), you probably want to automate the placement
and shepherding of these instances. Most users choose Kubernetes to do
this, but this is not mandatory.

## Configuration

### Resource requests

If using Kubernetes, each container should specify resource requests
so that the scheduler can place them on a node with sufficient capacity.

For example an ingester might request:

```
resources:
requests:
cpu: 4
memory: 10Gi
```

The specific values here should be adjusted based on your own
experiences running Cortex - they are very dependent on rate of data
arriving and other factors such as series churn.

### Spread out ingesters

Don't run multiple ingesters on the same node, as that raises the risk
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a quick note how ingesters transfer data (one pod will be spawned additionally, while the other is in the terminating state and transfer the data to the new pod)? Additionally I'd mention here that an operator always wants to avoid losing or moving ingesters as they are semi stateful. This impacts how you want to distribute the ingesters across zones, nodes and what machines you want to use (e. g. no preemptible/spot instances etc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hand-over would go along with basic documentation on ingesters, which we also need; it's not particularly a production thing.
Agreed on the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1560 for hand-over doc

that you will lost multiple replicas of data at the same time.

In Kubernetes this can be expressed as:

```
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: name
operator: In
values:
- ingester
topologyKey: "kubernetes.io/hostname"
```