Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## master / unreleased

## 0.5.0 / 2020-01-15

Note that the ruler flags need to be changed in this upgrade. You're moving from a single node ruler to something that can be sharded.
Further, if you're using the configs service, we've upgraded the migration library and this requires some manual intervention. See full
instructions below to upgrade your Postgres.

* [CHANGE] Flags changed with transition to upstream Prometheus rules manager:
* `ruler.client-timeout` is now `ruler.configs.client-timeout` in order to match `ruler.configs.url`
* `ruler.group-timeout`has been removed
Expand All @@ -13,6 +19,7 @@
* [CHANGE] Deprecated `-distributor.limiter-reload-period` flag. #1766
* [CHANGE] Ingesters now write only normalised tokens to the ring, although they can still read denormalised tokens used by other ingesters. `-ingester.normalise-tokens` is now deprecated, and ignored. If you want to switch back to using denormalised tokens, you need to downgrade to Cortex 0.4.0. Previous versions don't handle claiming tokens from normalised ingesters correctly. #1809
* [CHANGE] Overrides mechanism has been renamed to "runtime config", and is now separate from limits. Runtime config is simply a file that is reloaded by Cortex every couple of seconds. Limits and now also multi KV use this mechanism.<br />New arguments were introduced: `-runtime-config.file` (defaults to empty) and `-runtime-config.reload-period` (defaults to 10 seconds), which replace previously used `-limits.per-user-override-config` and `-limits.per-user-override-period` options. Old options are still used if `-runtime-config.file` is not specified. This change is also reflected in YAML configuration, where old `limits.per_tenant_override_config` and `limits.per_tenant_override_period` fields are replaced with `runtime_config.file` and `runtime_config.period` respectively. #1749
* [CHANGE] Cortex now rejects data with duplicate labels. Previously, such data was accepted, with duplicate labels removed with only one value left. #1964
* [FEATURE] The distributor can now drop labels from samples (similar to the removal of the replica label for HA ingestion) per user via the `distributor.drop-label` flag. #1726
* [FEATURE] Added `global` ingestion rate limiter strategy. Deprecated `-distributor.limiter-reload-period` flag. #1766
* [FEATURE] Added support for Microsoft Azure blob storage to be used for storing chunk data. #1913
Expand All @@ -22,11 +29,23 @@
* [ENHANCEMENT] Added `password` and `enable_tls` options to redis cache configuration. Enables usage of Microsoft Azure Cache for Redis service.
* [BUGFIX] Fixed unnecessary CAS operations done by the HA tracker when the jitter is enabled. #1861
* [BUGFIX] Fixed #1904 ingesters getting stuck in a LEAVING state after coming up from an ungraceful exit. #1921
* [BUGFIX] Reduce memory usage when ingester Push() errors. #1922
* [BUGFIX] TSDB: Fixed handling of out of order/bound samples in ingesters with the experimental TSDB blocks storage. #1864
* [BUGFIX] TSDB: Fixed querying ingesters in `LEAVING` state with the experimental TSDB blocks storage. #1854
* [BUGFIX] TSDB: Fixed error handling in the series to chunks conversion with the experimental TSDB blocks storage. #1837
* [BUGFIX] TSDB: Fixed TSDB creation conflict with blocks transfer in a `JOINING` ingester with the experimental TSDB blocks storage. #1818

### Upgrading Postgres

Reference: https://github.com/golang-migrate/migrate/tree/master/database/postgres#upgrading-from-v1

1. Install the migrate package cli tool: https://github.com/golang-migrate/migrate/tree/master/cmd/migrate#installation
2. Run the migrate command:

```bash
migrate -path <absolute_path_to_cortex>/cmd/cortex/migrations -database postgres://localhost:5432/database force 2
```

## 0.4.0 / 2019-12-02

* [CHANGE] The frontend component has been refactored to be easier to re-use. When upgrading the frontend, cache entries will be discarded and re-created with the new protobuf schema. #1734
Expand Down
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Our goal is to provide a new minor release every 4 weeks. This is a new process
| v0.2.0 | 2019-08-28 | Goutham Veeramachaneni (Github: @gouthamve) |
| v0.3.0 | 2019-10-09 | Bryan Boreham (@bboreham) |
| v0.4.0 | 2019-11-13 | Tom Wilkie (@tomwilkie) |
| v0.5.0 | 2019-12-11 | **searching for volunteer** |
| v0.5.0 | 2020-01-08 | Goutham Veeramachaneni (@gouthamve) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update this and mention it has been abandoned.

| v0.6.0 | 2020-02-19 | **searching for a volunteer** |

## Release shepherd responsibilities

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.0
0.5.0-rc.0
6 changes: 6 additions & 0 deletions cmd/cortex/migrations/002_immutable_configs.up.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
-- process (as exemplified in users/db/migrations/006 to 010), but currently
-- there are no data in production and only one row in dev.

-- https://github.com/mattes/migrate/tree/master/database/postgres#upgrading-from-v1
-- Wrap all commands in BEGIN and COMMIT to accommodate upgrade
BEGIN;

-- The existing id, type columns are the id & type of the entity that owns the
-- config.
ALTER TABLE configs RENAME COLUMN id TO owner_id;
Expand All @@ -12,3 +16,5 @@ ALTER TABLE configs ADD COLUMN id SERIAL;

ALTER TABLE configs DROP CONSTRAINT configs_pkey;
ALTER TABLE configs ADD PRIMARY KEY (id, owner_id, owner_type, subsystem);

COMMIT;
42 changes: 33 additions & 9 deletions pkg/distributor/distributor.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ package distributor
import (
"context"
"flag"
"fmt"
"net/http"
"sort"
"strings"
Expand Down Expand Up @@ -227,7 +226,7 @@ func (d *Distributor) Stop() {

func (d *Distributor) tokenForLabels(userID string, labels []client.LabelAdapter) (uint32, error) {
if d.cfg.ShardByAllLabels {
return shardByAllLabels(userID, labels)
return shardByAllLabels(userID, labels), nil
}

metricName, err := extract.MetricNameFromLabelAdapters(labels)
Expand All @@ -244,19 +243,15 @@ func shardByMetricName(userID string, metricName string) uint32 {
return h
}

func shardByAllLabels(userID string, labels []client.LabelAdapter) (uint32, error) {
// This function generates different values for different order of same labels.
func shardByAllLabels(userID string, labels []client.LabelAdapter) uint32 {
h := client.HashNew32()
h = client.HashAdd32(h, userID)
var lastLabelName string
for _, label := range labels {
if strings.Compare(lastLabelName, label.Name) >= 0 {
return 0, fmt.Errorf("labels not sorted")
}
h = client.HashAdd32(h, label.Name)
h = client.HashAdd32(h, label.Value)
lastLabelName = label.Name
}
return h, nil
return h
}

// Remove the label labelname from a slice of LabelPairs if it exists.
Expand Down Expand Up @@ -375,6 +370,13 @@ func (d *Distributor) Push(ctx context.Context, req *client.WriteRequest) (*clie
continue
}

// We rely on sorted labels in different places:
// 1) When computing token for labels, and sharding by all labels. Here different order of labels returns
// different tokens, which is bad.
// 2) In validation code, when checking for duplicate label names. As duplicate label names are rejected
// later in the validation phase, we ignore them here.
sortLabelsIfNeeded(ts.Labels)

// Generate the sharding token based on the series labels without the HA replica
// label and dropped labels (if any)
key, err := d.tokenForLabels(userID, ts.Labels)
Expand Down Expand Up @@ -450,6 +452,28 @@ func (d *Distributor) Push(ctx context.Context, req *client.WriteRequest) (*clie
return &client.WriteResponse{}, lastPartialErr
}

func sortLabelsIfNeeded(labels []client.LabelAdapter) {
// no need to run sort.Slice, if labels are already sorted, which is most of the time.
// we can avoid extra memory allocations (mostly interface-related) this way.
sorted := true
last := ""
for _, l := range labels {
if strings.Compare(last, l.Name) > 0 {
sorted = false
break
}
last = l.Name
}

if sorted {
return
}

sort.Slice(labels, func(i, j int) bool {
return strings.Compare(labels[i].Name, labels[j].Name) < 0
})
}

func (d *Distributor) sendSamples(ctx context.Context, ingester ring.IngesterDesc, timeseries []client.PreallocTimeseries) error {
h, err := d.ingesterPool.GetClientFor(ingester.Addr)
if err != nil {
Expand Down
43 changes: 34 additions & 9 deletions pkg/distributor/distributor_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"net/http"
"sort"
"strconv"
"strings"
"sync"
"testing"
"time"
Expand Down Expand Up @@ -905,7 +906,7 @@ func (i *mockIngester) Push(ctx context.Context, req *client.WriteRequest, opts

for j := range req.Timeseries {
series := req.Timeseries[j]
hash, _ := shardByAllLabels(orgid, series.Labels)
hash := shardByAllLabels(orgid, series.Labels)
existing, ok := i.timeseries[hash]
if !ok {
// Make a copy because the request Timeseries are reused
Expand Down Expand Up @@ -1183,22 +1184,46 @@ func TestRemoveReplicaLabel(t *testing.T) {
}
}

func TestShardByAllLabelsChecksForSortedLabelNames(t *testing.T) {
val, err := shardByAllLabels("test", []client.LabelAdapter{
// This is not great, but we deal with unsorted labels when validating labels.
func TestShardByAllLabelsReturnsWrongResultsForUnsortedLabels(t *testing.T) {
val1 := shardByAllLabels("test", []client.LabelAdapter{
{Name: "__name__", Value: "foo"},
{Name: "bar", Value: "baz"},
{Name: "sample", Value: "1"},
})

assert.NotZero(t, val)
assert.NoError(t, err)

val, err = shardByAllLabels("test", []client.LabelAdapter{
val2 := shardByAllLabels("test", []client.LabelAdapter{
{Name: "__name__", Value: "foo"},
{Name: "sample", Value: "1"},
{Name: "bar", Value: "baz"},
})

assert.Zero(t, val)
assert.Error(t, err)
assert.NotEqual(t, val1, val2)
}

func TestSortLabels(t *testing.T) {
sorted := []client.LabelAdapter{
{Name: "__name__", Value: "foo"},
{Name: "bar", Value: "baz"},
{Name: "cluster", Value: "cluster"},
{Name: "sample", Value: "1"},
}

// no allocations if input is already sorted
require.Equal(t, 0.0, testing.AllocsPerRun(100, func() {
sortLabelsIfNeeded(sorted)
}))

unsorted := []client.LabelAdapter{
{Name: "__name__", Value: "foo"},
{Name: "sample", Value: "1"},
{Name: "cluster", Value: "cluster"},
{Name: "bar", Value: "baz"},
}

sortLabelsIfNeeded(unsorted)

sort.SliceIsSorted(unsorted, func(i, j int) bool {
return strings.Compare(unsorted[i].Name, unsorted[j].Name) < 0
})
}
3 changes: 3 additions & 0 deletions pkg/ingester/client/compat.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,9 @@ func fromLabelMatchers(matchers []*LabelMatcher) ([]*labels.Matcher, error) {
// FromLabelAdaptersToLabels casts []LabelAdapter to labels.Labels.
// It uses unsafe, but as LabelAdapter == labels.Label this should be safe.
// This allows us to use labels.Labels directly in protos.
//
// Note: while resulting labels.Labels is supposedly sorted, this function
// doesn't enforce that. If input is not sorted, output will be wrong.
func FromLabelAdaptersToLabels(ls []LabelAdapter) labels.Labels {
return *(*labels.Labels)(unsafe.Pointer(&ls))
}
Expand Down
62 changes: 53 additions & 9 deletions pkg/util/validation/validate.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
package validation

import (
"fmt"
"net/http"
"strings"
"time"

"github.com/cortexproject/cortex/pkg/ingester/client"
Expand All @@ -14,14 +16,16 @@ import (
const (
discardReasonLabel = "reason"

errMissingMetricName = "sample missing metric name"
errInvalidMetricName = "sample invalid metric name: %.200q"
errInvalidLabel = "sample invalid label: %.200q metric %.200q"
errLabelNameTooLong = "label name too long: %.200q metric %.200q"
errLabelValueTooLong = "label value too long: %.200q metric %.200q"
errTooManyLabels = "sample for '%s' has %d label names; limit %d"
errTooOld = "sample for '%s' has timestamp too old: %d"
errTooNew = "sample for '%s' has timestamp too new: %d"
errMissingMetricName = "sample missing metric name"
errInvalidMetricName = "sample invalid metric name: %.200q"
errInvalidLabel = "sample invalid label: %.200q metric %.200q"
errLabelNameTooLong = "label name too long: %.200q metric %.200q"
errLabelValueTooLong = "label value too long: %.200q metric %.200q"
errTooManyLabels = "sample for '%s' has %d label names; limit %d"
errTooOld = "sample for '%s' has timestamp too old: %d"
errTooNew = "sample for '%s' has timestamp too new: %d"
errDuplicateLabelName = "duplicate label name: %.200q metric %.200q"
errLabelsNotSorted = "labels not sorted: %.200q metric %.200q"

// ErrQueryTooLong is used in chunk store and query frontend.
ErrQueryTooLong = "invalid query, length > limit (%s > %s)"
Expand All @@ -31,6 +35,8 @@ const (
tooFarInFuture = "too_far_in_future"
invalidLabel = "label_invalid"
labelNameTooLong = "label_name_too_long"
duplicateLabelNames = "duplicate_label_names"
labelsNotSorted = "labels_not_sorted"
labelValueTooLong = "label_value_too_long"

// RateLimited is one of the values for the reason to discard samples.
Expand Down Expand Up @@ -102,6 +108,7 @@ func ValidateLabels(cfg LabelValidationConfig, userID string, ls []client.LabelA

maxLabelNameLength := cfg.MaxLabelNameLength(userID)
maxLabelValueLength := cfg.MaxLabelValueLength(userID)
lastLabelName := ""
for _, l := range ls {
var errTemplate string
var reason string
Expand All @@ -118,11 +125,48 @@ func ValidateLabels(cfg LabelValidationConfig, userID string, ls []client.LabelA
reason = labelValueTooLong
errTemplate = errLabelValueTooLong
cause = l.Value
} else if cmp := strings.Compare(lastLabelName, l.Name); cmp >= 0 {
if cmp == 0 {
reason = duplicateLabelNames
errTemplate = errDuplicateLabelName
cause = l.Name
} else {
reason = labelsNotSorted
errTemplate = errLabelsNotSorted
cause = l.Name
}
}
if errTemplate != "" {
DiscardedSamples.WithLabelValues(reason, userID).Inc()
return httpgrpc.Errorf(http.StatusBadRequest, errTemplate, cause, client.FromLabelAdaptersToMetric(ls).String())
return httpgrpc.Errorf(http.StatusBadRequest, errTemplate, cause, formatLabelSet(ls))
}
lastLabelName = l.Name
}
return nil
}

// this function formats label adapters as a metric name with labels, while preserving
// label order, and keeping duplicates. If there are multiple "__name__" labels, only
// first one is used as metric name, other ones will be included as regular labels.
func formatLabelSet(ls []client.LabelAdapter) string {
metricName, hasMetricName := "", false

labelStrings := make([]string, 0, len(ls))
for _, l := range ls {
if l.Name == model.MetricNameLabel && !hasMetricName && l.Value != "" {
metricName = l.Value
hasMetricName = true
} else {
labelStrings = append(labelStrings, fmt.Sprintf("%s=%q", l.Name, l.Value))
}
}

if len(labelStrings) == 0 {
if hasMetricName {
return metricName
}
return "{}"
}

return fmt.Sprintf("%s{%s}", metricName, strings.Join(labelStrings, ", "))
}
38 changes: 38 additions & 0 deletions pkg/util/validation/validate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,3 +80,41 @@ func TestValidateLabels(t *testing.T) {
assert.Equal(t, c.err, err, "wrong error")
}
}

func TestValidateLabelOrder(t *testing.T) {
var cfg validateLabelsCfg
cfg.maxLabelNameLength = 10
cfg.maxLabelNamesPerSeries = 10
cfg.maxLabelValueLength = 10

userID := "testUser"

err := ValidateLabels(cfg, userID, []client.LabelAdapter{
{Name: model.MetricNameLabel, Value: "m"},
{Name: "b", Value: "b"},
{Name: "a", Value: "a"},
})
assert.Equal(t, httpgrpc.Errorf(http.StatusBadRequest, errLabelsNotSorted, "a", `m{b="b", a="a"}`), err)
}

func TestValidateLabelDuplication(t *testing.T) {
var cfg validateLabelsCfg
cfg.maxLabelNameLength = 10
cfg.maxLabelNamesPerSeries = 10
cfg.maxLabelValueLength = 10

userID := "testUser"

err := ValidateLabels(cfg, userID, []client.LabelAdapter{
{Name: model.MetricNameLabel, Value: "a"},
{Name: model.MetricNameLabel, Value: "b"},
})
assert.Equal(t, httpgrpc.Errorf(http.StatusBadRequest, errDuplicateLabelName, "__name__", `a{__name__="b"}`), err)

err = ValidateLabels(cfg, userID, []client.LabelAdapter{
{Name: model.MetricNameLabel, Value: "a"},
{Name: "a", Value: "a"},
{Name: "a", Value: "a"},
})
assert.Equal(t, httpgrpc.Errorf(http.StatusBadRequest, errDuplicateLabelName, "a", `a{a="a", a="a"}`), err)
}