Skip to content

Commit b2dacf4

Browse files
committed
docs: Add docs on how HA sample handling works.
Signed-off-by: Goutham Veeramachaneni <[email protected]>
1 parent 5f39048 commit b2dacf4

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

docs/ha-pair-handling.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Config for sending HA Pairs data to cortex
2+
3+
## Context
4+
5+
With Prometheus, you can have more than a single prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy and it doesn't make sense to ingest the same data twice in cortex. So in cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following:
6+
7+
Assume that there are two teams, each running their own prometheus, monitoring different services. Let's call the Prometheis T1 and T2. Now, if the teams are running HA pairs, let's call the individual Prometheis, T1.a, T1.b and T2.a and T2.b.
8+
9+
In cortex we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if cortex sees no new samples from T1.a for a short period (30s by default), it'll switch the leader to be T1.b.
10+
11+
This means if T1.a goes down for 10 mins and comes back, we will no longer be accepting samples from T1.a, we will be accepting from T1.b and dropping the samples from T1.a. This way we can preserve the HA redundancy behaviour and make sure we're only accepting samples from a single replica and also we don't drop too much data in case of issues. Please note that with the default scrape period of 15s, you'd ideally be losing the metrics from only a single scrape in case we need to switch leaders. Your rate windows should be atleast 4x the scrape period to make sure you can tolerate this potentially rare occurrence.
12+
13+
Now we do the same leader election process T2.
14+
15+
## Config
16+
17+
### Client Side
18+
19+
So for cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc) and one identifier to identify the replica in the cluster (a or b). We do this by setting the external labels, ideally `cluster` and `replica`. For example:
20+
21+
```
22+
cluster: prom-team1
23+
replica: replica1 (or pod-name)
24+
```
25+
26+
and
27+
28+
```
29+
cluster: prom-team1
30+
replica: replica2
31+
```
32+
33+
Note: These are external labels and have nothing to do with remote_write config.
34+
35+
Now these two label-names are totally configurable on Cortex's end, and should be set to something sensible. For example, cluster label is already used by some workloads, and you should set the label to be something else but uniquely identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc.
36+
37+
And coming to the replica label, the name is totally configurable again and should be set so that the value for each prometheus to be unique in that cluster. Note: Cortex drops this label when ingesting data, but preserves the cluster label. This way, your timeseries won't change when replicas change.
38+
39+
### Server Side
40+
41+
To enable handling of samples, see the [distibutor flags](./arguments.md#ha-tracker) having `ha-tracker` in them.

0 commit comments

Comments
 (0)