Skip to content

Commit c59cf80

Browse files
committed
Merge remote-tracking branch 'upstream/master' into tokens-file
Signed-off-by: Ganesh Vernekar <[email protected]>
2 parents 7ab02ff + 2e21f82 commit c59cf80

File tree

372 files changed

+70335
-2341
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

372 files changed

+70335
-2341
lines changed

.github/pull_request_template.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
<!-- Thanks for sending a pull request! Before submitting:
2+
3+
1. Read our CONTRIBUTING.md guide
4+
2. Rebase your PR if it gets out of sync with master
5+
-->
6+
7+
**What this PR does**:
8+
9+
**Which issue(s) this PR fixes**:
10+
Fixes #<issue number>
11+
12+
**Checklist**
13+
- [ ] Tests updated
14+
- [ ] Documentation added
15+
- [ ] `CHANGELOG.md` updated

.lintignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
./tools*
33
./vendor*
44
./pkg/configs/legacy_promql*
5+
./.pkg*

ADOPTERS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
This is the list of organisations that are using Cortex in **production environments** to power their metrics and monitoring systems. Please send PRs to add or remove organisations.
44

55
* [Aspen Mesh](https://aspenmesh.io/)
6+
* [DigitalOcean](https://www.digitalocean.com/)
67
* [Electronic Arts](https://www.ea.com/)
78
* [GoJek](https://www.gojek.io/)
89
* [GrafanaLabs](https://grafana.com/)

CHANGELOG.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,15 @@
1+
# Changelog
2+
13
## master / unreleased
24

5+
* [CHANGE] The frontend component has been refactored to be easier to re-use. When upgrading the frontend, cache entries will be discarded and re-created with the new protobuf schema. #1734
6+
* [CHANGE] Remove direct DB/API access from the ruler
37
* [CHANGE] Removed `Delta` encoding. Any old chunks with `Delta` encoding cannot be read anymore. If `ingester.chunk-encoding` is set to `Delta` the ingester will fail to start. #1706
8+
* [FEATURE] Global limit on the max series per user and metric #1760
9+
* `-ingester.max-global-series-per-user`
10+
* `-ingester.max-global-series-per-metric`
11+
* [FEATURE] Flush chunks with stale markers early with `ingester.max-stale-chunk-idle`. #1759
12+
* [FEATURE] EXPERIMENTAL: Added new KV Store backend based on memberlist library. Components can gossip about tokens and ingester states, instead of using Consul or Etcd. #1721
413
* [ENHANCEMENT] Allocation improvements in adding samples to Chunk. #1706
514
* [ENHANCEMENT] Consul client now follows recommended practices for blocking queries wrt returned Index value. #1708
615
* [ENHANCEMENT] Consul client can optionally rate-limit itself during Watch (used e.g. by ring watchers) and WatchPrefix (used by HA feature) operations. Rate limiting is disabled by default. New flags added: `--consul.watch-rate-limit`, and `--consul.watch-burst-size`. #1708

CONTRIBUTING.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ Welcome! We're excited that you're interested in contributing. Below are some ba
66

77
Cortex follows a standard GitHub pull request workflow. If you're unfamiliar with this workflow, read the very helpful [Understanding the GitHub flow](https://guides.github.com/introduction/flow/) guide from GitHub.
88

9+
You are welcome to create draft PRs at any stage of readiness - this
10+
can be helpful to ask for assistance or to develop an idea. But before
11+
a piece of work is finished it should:
12+
13+
* Be organised into one or more commits, each of which has a commit message that describes all changes made in that commit ('why' more than 'what' - we can read the diffs to see the code that changed).
14+
* Each commit should build towards the whole - don't leave in back-tracks and mistakes that you later corrected.
15+
* Have tests for new functionality or tests that would have caught the bug being fixed.
16+
* Include a CHANGELOG message if users of Cortex need to hear about what you did.
17+
918
## Developer Certificates of Origin (DCOs)
1019

1120
Before submitting your work in a pull request, make sure that *all* commits are signed off with a **Developer Certificate of Origin** (DCO). Here's an example:

Makefile

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,6 @@ RM := --rm
7474
# in any custom cloudbuild.yaml files
7575
TTY := --tty
7676
GO_FLAGS := -ldflags "-extldflags \"-static\" -s -w" -tags netgo
77-
NETGO_CHECK = @strings $@ | grep cgo_stub\\\.go >/dev/null || { \
78-
rm $@; \
79-
echo "\nYour go standard library was built without the 'netgo' build tag."; \
80-
echo "To fix that, run"; \
81-
echo " sudo go clean -i net"; \
82-
echo " sudo go install -tags netgo std"; \
83-
false; \
84-
}
8577

8678
ifeq ($(BUILD_IN_CONTAINER),true)
8779

@@ -121,12 +113,11 @@ exes: $(EXES)
121113

122114
$(EXES):
123115
CGO_ENABLED=0 go build $(GO_FLAGS) -o $@ ./$(@D)
124-
$(NETGO_CHECK)
125116

126117
protos: $(PROTO_GOS)
127118

128119
%.pb.go:
129-
protoc -I $(GOPATH)/src:./vendor:./$(@D) --gogoslick_out=plugins=grpc:./$(@D) ./$(patsubst %.pb.go,%.proto,$@)
120+
protoc -I $(GOPATH)/src:./vendor:./$(@D) --gogoslick_out=plugins=grpc,Mgoogle/protobuf/any.proto=github.com/gogo/protobuf/types,:./$(@D) ./$(patsubst %.pb.go,%.proto,$@)
130121

131122
lint:
132123
./tools/lint -notestpackage -novet -ignorespelling queriers -ignorespelling Queriers .

docs/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,4 +145,4 @@ The interface works somewhat differently across the supported databases:
145145

146146
A set of schemas are used to map the matchers and label sets used on reads and writes to the chunk store into appropriate operations on the index. Schemas have been added as Cortex has evolved, mainly in an attempt to better load balance writes and improve query performance.
147147

148-
> The current schema recommendation is the **v10 schema**.
148+
> The current schema recommendation is the **v10 schema**. v11 schema is an experimental schema.

docs/arguments.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,39 @@ prefix these flags with `distributor.ha-tracker.`
142142
- `etcd.max-retries`
143143
The maximum number of retries to do for failed ops.
144144

145+
#### memberlist (EXPERIMENTAL)
146+
147+
Flags for configuring KV store based on memberlist library. This feature is experimental, please don't use it yet.
148+
149+
- `memberlist.nodename`
150+
Name of the node in memberlist cluster. Defaults to hostname.
151+
- `memberlist.retransmit-factor`
152+
Multiplication factor used when sending out messages (factor * log(N+1)). If not set, default value is used.
153+
- `memberlist.join`
154+
Other cluster members to join. Can be specified multiple times.
155+
- `memberlist.abort-if-join-fails`
156+
If this node fails to join memberlist cluster, abort.
157+
- `memberlist.left-ingesters-timeout`
158+
How long to keep LEFT ingesters in the ring. Note: this is only used for gossiping, LEFT ingesters are otherwise invisible.
159+
- `memberlist.leave-timeout`
160+
Timeout for leaving memberlist cluster.
161+
- `memberlist.gossip-interval`
162+
How often to gossip with other cluster members. Uses memberlist LAN defaults if 0.
163+
- `memberlist.gossip-nodes`
164+
How many nodes to gossip with in each gossip interval. Uses memberlist LAN defaults if 0.
165+
- `memberlist.pullpush-interval`
166+
How often to use pull/push sync. Uses memberlist LAN defaults if 0.
167+
- `memberlist.bind-addr`
168+
IP address to listen on for gossip messages. Multiple addresses may be specified. Defaults to 0.0.0.0.
169+
- `memberlist.bind-port`
170+
Port to listen on for gossip messages. Defaults to 7946.
171+
- `memberlist.packet-dial-timeout`
172+
Timeout used when connecting to other nodes to send packet.
173+
- `memberlist.packet-write-timeout`
174+
Timeout for writing 'packet' data.
175+
- `memberlist.transport-debug`
176+
Log debug transport messages. Note: global log.level must be at debug level as well.
177+
145178
### HA Tracker
146179

147180
HA tracking has two of it's own flags:
@@ -175,6 +208,14 @@ It also talks to a KVStore and has it's own copies of the same flags used by the
175208

176209
The maximum duration of a timeseries chunk in memory. If a timeseries runs for longer than this the current chunk will be flushed to the store and a new chunk created. (default 12h)
177210

211+
- `-ingester.max-chunk-idle`
212+
213+
If a series doesn't receive a sample for this duration, it is flushed and removed from memory.
214+
215+
- `-ingester.max-stale-chunk-idle`
216+
217+
If a series receives a [staleness marker](https://www.robustperception.io/staleness-and-promql), then we wait for this duration to get another sample before we close and flush this series, removing it from memory. You want it to be at least 2x the scrape interval as you don't want a single failed scrape to cause a chunk flush.
218+
178219
- `-ingester.chunk-age-jitter`
179220

180221
To reduce load on the database exactly 12 hours after starting, the age limit is reduced by a varying amount up to this. (default 20m)

docs/ha-pair-handling.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Config for sending HA Pairs data to Cortex
2+
3+
## Context
4+
5+
You can have more than a single Prometheus monitoring and ingesting the same metrics for redundancy. Cortex already does replication for redundancy and it doesn't make sense to ingest the same data twice. So in Cortex, we made sure we can dedupe the data we receive from HA Pairs of Prometheus. We do this via the following:
6+
7+
Assume that there are two teams, each running their own Prometheus, monitoring different services. Let's call the Prometheis T1 and T2. Now, if the teams are running HA pairs, let's call the individual Prometheis, T1.a, T1.b and T2.a and T2.b.
8+
9+
In Cortex we make sure we only ingest from one of T1.a and T1.b, and only from one of T2.a and T2.b. We do this by electing a leader replica for each cluster of Prometheus. For example, in the case of T1, let it be T1.a. As long as T1.a is the leader, we drop the samples sent by T1.b. And if Cortex sees no new samples from T1.a for a short period (30s by default), it'll switch the leader to be T1.b.
10+
11+
This means if T1.a goes down for a few minutes Cortex's HA sample handling will have switched and elected T1.b as the leader. This failover timeout is what enables us to only accept samples from a single replica at a time, but ensure we don't drop too much data in case of issues. Note that with the default scrape period of 15s, and the default timeouts in Cortex, in most cases you'll only lose a single scrape of data in the case of a leader election failover. For any rate queries the rate window should be at least 4x the scrape period to account for any of these failover scenarios, for example with the default scrape period of 15s then you should calculate rates over at least 1m periods.
12+
13+
Now we do the same leader election process T2.
14+
15+
## Config
16+
17+
### Client Side
18+
19+
So for Cortex to achieve this, we need 2 identifiers for each process, one identifier for the cluster (T1 or T2, etc) and one identifier to identify the replica in the cluster (a or b). The easiest way to do with is by setting external labels, ideally `cluster` and `replica` (note the default is `__replica__`). For example:
20+
21+
```
22+
cluster: prom-team1
23+
replica: replica1 (or pod-name)
24+
```
25+
26+
and
27+
28+
```
29+
cluster: prom-team1
30+
replica: replica2
31+
```
32+
33+
Note: These are external labels and have nothing to do with remote_write config.
34+
35+
These two label names are configurable per-tenant within Cortex, and should be set to something sensible. For example, cluster label is already used by some workloads, and you should set the label to be something else but uniquely identifies the cluster. Good examples for this label-name would be `team`, `cluster`, `prometheus`, etc.
36+
37+
The replica label should be set so that the value for each prometheus is unique in that cluster. Note: Cortex drops this label when ingesting data, but preserves the cluster label. This way, your timeseries won't change when replicas change.
38+
39+
### Server Side
40+
41+
To enable handling of samples, see the [distibutor flags](./arguments.md#ha-tracker) having `ha-tracker` in them.

docs/single-process-config.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ ingester:
5151
# for the chunks.
5252
schema:
5353
configs:
54-
- from: 2019-03-25
54+
- from: 2019-07-29
5555
store: boltdb
5656
object_store: filesystem
5757
schema: v10
@@ -65,4 +65,3 @@ storage:
6565

6666
filesystem:
6767
directory: /tmp/cortex/chunks
68-

go.mod

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,18 @@ require (
88
github.com/Azure/go-autorest v11.5.1+incompatible // indirect
99
github.com/Masterminds/squirrel v0.0.0-20161115235646-20f192218cf5
1010
github.com/NYTimes/gziphandler v1.1.1
11-
github.com/aws/aws-sdk-go v1.23.12
11+
github.com/alecthomas/units v0.0.0-20190717042225-c3de453c63f4
12+
github.com/aws/aws-sdk-go v1.25.22
1213
github.com/bitly/go-hostpool v0.0.0-20171023180738-a3a6125de932 // indirect
1314
github.com/blang/semver v3.5.0+incompatible
1415
github.com/bmizerany/assert v0.0.0-20160611221934-b7ed37b82869 // indirect
1516
github.com/bradfitz/gomemcache v0.0.0-20190329173943-551aad21a668
1617
github.com/cenkalti/backoff v1.0.0 // indirect
1718
github.com/cespare/xxhash v1.1.0
18-
github.com/codahale/hdrhistogram v0.0.0-20161010025455-3a0bb77429bd // indirect
1919
github.com/coreos/go-semver v0.3.0 // indirect
2020
github.com/coreos/go-systemd v0.0.0-20181012123002-c6f51f82210d // indirect
2121
github.com/coreos/pkg v0.0.0-20180928190104-399ea9e2e55f // indirect
2222
github.com/cznic/ql v1.2.0 // indirect
23-
github.com/dustin/go-humanize v1.0.0 // indirect
2423
github.com/facette/natsort v0.0.0-20181210072756-2cd4dd1e2dcb
2524
github.com/fluent/fluent-logger-golang v1.2.1 // indirect
2625
github.com/fsouza/fake-gcs-server v1.3.0
@@ -39,6 +38,8 @@ require (
3938
github.com/hailocab/go-hostpool v0.0.0-20160125115350-e80d13ce29ed // indirect
4039
github.com/hashicorp/consul/api v1.1.0
4140
github.com/hashicorp/go-cleanhttp v0.5.1
41+
github.com/hashicorp/go-sockaddr v1.0.2
42+
github.com/hashicorp/memberlist v0.1.4
4243
github.com/jonboulle/clockwork v0.1.0
4344
github.com/json-iterator/go v1.1.7
4445
github.com/konsorten/go-windows-terminal-sequences v1.0.2 // indirect
@@ -48,6 +49,7 @@ require (
4849
github.com/lib/pq v1.0.0
4950
github.com/mattes/migrate v1.3.1
5051
github.com/mattn/go-sqlite3 v1.10.0 // indirect
52+
github.com/oklog/ulid v1.3.1
5153
github.com/opentracing-contrib/go-grpc v0.0.0-20180928155321-4b5a12d3ff02
5254
github.com/opentracing-contrib/go-stdlib v0.0.0-20190519235532-cf7a6c988dc9
5355
github.com/opentracing/opentracing-go v1.1.0
@@ -62,11 +64,10 @@ require (
6264
github.com/segmentio/fasthash v0.0.0-20180216231524-a72b379d632e
6365
github.com/sercand/kuberesolver v2.1.0+incompatible // indirect
6466
github.com/stretchr/testify v1.4.0
67+
github.com/thanos-io/thanos v0.7.0
6568
github.com/tinylib/msgp v0.0.0-20161221055906-38a6f61a768d // indirect
6669
github.com/tmc/grpc-websocket-proxy v0.0.0-20190109142713-0ad062ec5ee5 // indirect
67-
github.com/uber-go/atomic v1.3.2 // indirect
6870
github.com/uber/jaeger-client-go v2.16.0+incompatible
69-
github.com/uber/jaeger-lib v2.0.0+incompatible // indirect
7071
github.com/weaveworks/billing-client v0.0.0-20171006123215-be0d55e547b1
7172
github.com/weaveworks/common v0.0.0-20190822150010-afb9996716e4
7273
github.com/weaveworks/promrus v1.2.0 // indirect

0 commit comments

Comments
 (0)