Skip to content

Commit 4ad61d2

Browse files
authored
Add Alertmanager Integration Tests and Static File Backend (#2125)
* refactor alertmanager storage and add static file alert store and add integration test for alertmananger Signed-off-by: Jacob Lisi <[email protected]> * fix client function Signed-off-by: Jacob Lisi <[email protected]> * update changelog Signed-off-by: Jacob Lisi <[email protected]>
1 parent 8808b90 commit 4ad61d2

20 files changed

+1370
-106
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,20 @@
22

33
## master / unreleased
44

5+
* [CHANGE] Config file changed to remove top level `config_store` field in favor of a nested `configdb` field. #2125
56
* [CHANGE] Removed unnecessary `frontend.cache-split-interval` in favor of `querier.split-queries-by-interval` both to reduce configuration complexity and guarantee alignment of these two configs. Starting from now, `-querier.cache-results` may only be enabled in conjunction with `-querier.split-queries-by-interval` (previously the cache interval default was `24h` so if you want to preserve the same behaviour you should set `-querier.split-queries-by-interval=24h`). #2040
67
* [CHANGE] Removed remaining support for using denormalised tokens in the ring. If you're still running ingesters with denormalised tokens (Cortex 0.4 or earlier, with `-ingester.normalise-tokens=false`), such ingesters will now be completely invisible to distributors and need to be either switched to Cortex 0.6.0 or later, or be configured to use normalised tokens. #2034
78
* [CHANGE] Moved `--store.min-chunk-age` to the Querier config as `--querier.query-store-after`, allowing the store to be skipped during query time if the metrics wouldn't be found. The YAML config option `ingestermaxquerylookback` has been renamed to `query_ingesters_within` to match its CLI flag. #1893
89
* `--store.min-chunk-age` has been removed
910
* `--querier.query-store-after` has been added in it's place.
1011
* [CHANGE] Experimental Memberlist KV store can now be used in single-binary Cortex. Attempts to use it previously would fail with panic. This change also breaks existing binary protocol used to exchange gossip messages, so this version will not be able to understand gossiped Ring when used in combination with the previous version of Cortex. Easiest way to upgrade is to shutdown old Cortex installation, and restart it with new version. Incremental rollout works too, but with reduced functionality until all components run the same version. #2016
1112
* [CHANGE] Renamed the cache configuration setting `defaul_validity` to `default_validity`. #2140
13+
* [FEATURE] Added a read-only local alertmanager config store using files named corresponding to their tenant id. #2125
1214
* [FEATURE] Added user sub rings to distribute users to a subset of ingesters. #1947
1315
* `--experimental.distributor.user-subring-size`
1416
* [FEATURE] Added flag `-experimental.ruler.enable-api` to enable the ruler api which implements the Prometheus API `/api/v1/rules` and `/api/v1/alerts` endpoints under the configured `-http.prefix`. #1999
1517
* [FEATURE] Added sharding support to compactor when using the experimental TSDB blocks storage. #2113
18+
* [ENHANCEMENT] Add `status` label to `cortex_alertmanager_configs` metric to gauge the number of valid and invalid configs. #2125
1619
* [ENHANCEMENT] Cassandra Authentication: added the `custom_authenticators` config option that allows users to authenticate with cassandra clusters using password authenticators that are not approved by default in [gocql](https://github.com/gocql/gocql/blob/81b8263d9fe526782a588ef94d3fa5c6148e5d67/conn.go#L27) #2093
1720
* [ENHANCEMENT] Experimental TSDB: Export TSDB Syncer metrics from Compactor component, they are prefixed with `cortex_compactor_`. #2023
1821
* [ENHANCEMENT] Experimental TSDB: Added dedicated flag `-experimental.tsdb.bucket-store.tenant-sync-concurrency` to configure the maximum number of concurrent tenants for which blocks are synched. #2026

docs/configuration/config-file-reference.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -86,11 +86,6 @@ Supported contents and default values of the config file:
8686
# and used by the 'configs' service to expose APIs to manage them.
8787
[configdb: <configdb_config>]
8888

89-
# The configstore_config configures the config database storing rules and
90-
# alerts, and is used by the Cortex alertmanager.
91-
# The CLI flags prefix for this block config is: alertmanager
92-
[config_store: <configstore_config>]
93-
9489
# The alertmanager_config configures the Cortex alertmanager.
9590
[alertmanager: <alertmanager_config>]
9691

@@ -821,6 +816,22 @@ externalurl:
821816
# Root of URL to generate if config is http://internal.monitor
822817
# CLI flag: -alertmanager.configs.auto-webhook-root
823818
[autowebhookroot: <string> | default = ""]
819+
820+
store:
821+
# Type of backend to use to store alertmanager configs. Supported values are:
822+
# "configdb", "local".
823+
# CLI flag: -alertmanager.storage.type
824+
[type: <string> | default = "configdb"]
825+
826+
# The configstore_config configures the config database storing rules and
827+
# alerts, and is used by the Cortex alertmanager.
828+
# The CLI flags prefix for this block config is: alertmanager
829+
[configdb: <configstore_config>]
830+
831+
local:
832+
# Path at which alertmanager configurations are stored.
833+
# CLI flag: -alertmanager.storage.local.path
834+
[path: <string> | default = ""]
824835
```
825836

826837
## `table_manager_config`

integration/alertmanager_test.go

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
package main
2+
3+
import (
4+
"context"
5+
"io/ioutil"
6+
"os"
7+
"path/filepath"
8+
"testing"
9+
10+
"github.com/stretchr/testify/require"
11+
12+
"github.com/cortexproject/cortex/integration/e2e"
13+
"github.com/cortexproject/cortex/integration/e2ecortex"
14+
)
15+
16+
func TestAlertmanager(t *testing.T) {
17+
s, err := e2e.NewScenario(networkName)
18+
require.NoError(t, err)
19+
defer s.Close()
20+
21+
alertmanagerDir := filepath.Join(s.SharedDir(), "alertmanager_configs")
22+
require.NoError(t, os.Mkdir(alertmanagerDir, os.ModePerm))
23+
24+
require.NoError(t, ioutil.WriteFile(
25+
filepath.Join(alertmanagerDir, "user-1.yaml"),
26+
[]byte(cortexAlertmanagerUserConfigYaml),
27+
os.ModePerm),
28+
)
29+
30+
alertmanager := e2ecortex.NewAlertmanager("alertmanager", AlertmanagerConfigs, "")
31+
require.NoError(t, s.StartAndWaitReady(alertmanager))
32+
require.NoError(t, alertmanager.WaitSumMetric("cortex_alertmanager_configs", 1))
33+
34+
c, err := e2ecortex.NewClient("", "", alertmanager.Endpoint(80), "user-1")
35+
require.NoError(t, err)
36+
37+
cfg, err := c.GetAlertmanagerConfig(context.Background())
38+
require.NoError(t, err)
39+
40+
// Ensure the returned status config matches alertmanager_test_fixtures/user-1.yaml
41+
require.NotNil(t, cfg)
42+
require.Equal(t, "example_receiver", cfg.Route.Receiver)
43+
require.Len(t, cfg.Route.GroupByStr, 1)
44+
require.Equal(t, "example_groupby", cfg.Route.GroupByStr[0])
45+
require.Len(t, cfg.Receivers, 1)
46+
require.Equal(t, "example_receiver", cfg.Receivers[0].Name)
47+
}

integration/backward_compatibility_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ func TestBackwardCompatibilityWithChunksStorage(t *testing.T) {
4848
now := time.Now()
4949
series, expectedVector := generateSeries("series_1", now)
5050

51-
c, err := e2ecortex.NewClient(distributor.Endpoint(80), "", "user-1")
51+
c, err := e2ecortex.NewClient(distributor.Endpoint(80), "", "", "user-1")
5252
require.NoError(t, err)
5353

5454
res, err := c.Push(series)
@@ -74,7 +74,7 @@ func TestBackwardCompatibilityWithChunksStorage(t *testing.T) {
7474
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))
7575

7676
// Query the series
77-
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
77+
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
7878
require.NoError(t, err)
7979

8080
result, err := c.Query("series_1", now)

integration/configs.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,22 @@ const (
2222
prefix: cortex_chunks_
2323
period: 168h0m0s
2424
`
25+
26+
cortexAlertmanagerUserConfigYaml = `route:
27+
receiver: "example_receiver"
28+
group_by: ["example_groupby"]
29+
receivers:
30+
- name: "example_receiver"
31+
`
2532
)
2633

2734
var (
35+
AlertmanagerConfigs = map[string]string{
36+
"-alertmanager.storage.local.path": filepath.Join(e2e.ContainerSharedDir, "alertmanager_configs"),
37+
"-alertmanager.storage.type": "local",
38+
"-alertmanager.web.external-url": "http://localhost/api/prom",
39+
}
40+
2841
BlocksStorage = map[string]string{
2942
"-store.engine": "tsdb",
3043
"-experimental.tsdb.backend": "s3",

integration/e2ecortex/client.go

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,24 @@ package e2ecortex
33
import (
44
"bytes"
55
"context"
6+
"encoding/json"
67
"fmt"
78
"net/http"
89
"time"
910

1011
"github.com/gogo/protobuf/proto"
1112
"github.com/golang/snappy"
13+
alertConfig "github.com/prometheus/alertmanager/config"
1214
promapi "github.com/prometheus/client_golang/api"
1315
promv1 "github.com/prometheus/client_golang/api/prometheus/v1"
1416
"github.com/prometheus/common/model"
1517
"github.com/prometheus/prometheus/prompb"
18+
yaml "gopkg.in/yaml.v2"
1619
)
1720

1821
// Client is a client used to interact with Cortex in integration tests
1922
type Client struct {
23+
alertmanagerClient promapi.Client
2024
distributorAddress string
2125
timeout time.Duration
2226
httpClient *http.Client
@@ -25,7 +29,7 @@ type Client struct {
2529
}
2630

2731
// NewClient makes a new Cortex client
28-
func NewClient(distributorAddress string, querierAddress string, orgID string) (*Client, error) {
32+
func NewClient(distributorAddress string, querierAddress string, alertmanagerAddress string, orgID string) (*Client, error) {
2933
// Create querier API client
3034
querierAPIClient, err := promapi.NewClient(promapi.Config{
3135
Address: "http://" + querierAddress + "/api/prom",
@@ -43,6 +47,17 @@ func NewClient(distributorAddress string, querierAddress string, orgID string) (
4347
orgID: orgID,
4448
}
4549

50+
if alertmanagerAddress != "" {
51+
alertmanagerAPIClient, err := promapi.NewClient(promapi.Config{
52+
Address: "http://" + alertmanagerAddress + "/api/prom",
53+
RoundTripper: &addOrgIDRoundTripper{orgID: orgID, next: http.DefaultTransport},
54+
})
55+
if err != nil {
56+
return nil, err
57+
}
58+
c.alertmanagerClient = alertmanagerAPIClient
59+
}
60+
4661
return c, nil
4762
}
4863

@@ -95,3 +110,37 @@ func (r *addOrgIDRoundTripper) RoundTrip(req *http.Request) (*http.Response, err
95110

96111
return r.next.RoundTrip(req)
97112
}
113+
114+
// ServerStatus represents a Alertmanager status response
115+
// TODO: Upgrade to Alertmanager v0.20.0+ and utilize vendored structs
116+
type ServerStatus struct {
117+
Data struct {
118+
ConfigYaml string `json:"configYAML"`
119+
} `json:"data"`
120+
}
121+
122+
// GetAlertmanagerConfig gets the status of an alertmanager instance
123+
func (c *Client) GetAlertmanagerConfig(ctx context.Context) (*alertConfig.Config, error) {
124+
u := c.alertmanagerClient.URL("/api/v1/status", nil)
125+
126+
req, err := http.NewRequest(http.MethodGet, u.String(), nil)
127+
if err != nil {
128+
return nil, fmt.Errorf("error creating request: %v", err)
129+
}
130+
131+
_, body, _, err := c.alertmanagerClient.Do(ctx, req) // Ignoring warnings.
132+
if err != nil {
133+
return nil, err
134+
}
135+
136+
var ss *ServerStatus
137+
err = json.Unmarshal(body, &ss)
138+
if err != nil {
139+
return nil, err
140+
}
141+
142+
cfg := &alertConfig.Config{}
143+
err = yaml.Unmarshal([]byte(ss.Data.ConfigYaml), cfg)
144+
145+
return cfg, err
146+
}

integration/e2ecortex/services.go

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,3 +120,21 @@ func NewSingleBinary(name string, flags map[string]string, image string, httpPor
120120
otherPorts...,
121121
)
122122
}
123+
124+
func NewAlertmanager(name string, flags map[string]string, image string) *e2e.HTTPService {
125+
if image == "" {
126+
image = GetDefaultImage()
127+
}
128+
129+
return e2e.NewHTTPService(
130+
name,
131+
image,
132+
e2e.NewCommandWithoutEntrypoint("cortex", e2e.BuildArgs(e2e.MergeFlags(map[string]string{
133+
"-target": "alertmanager",
134+
"-log.level": "warn",
135+
}, flags))...),
136+
// The alertmanager doesn't expose a readiness probe, so we just check if the / returns 404
137+
e2e.NewReadinessProbe(80, "/", 404),
138+
80,
139+
)
140+
}

integration/getting_started_single_process_config_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ func TestGettingStartedSingleProcessConfig(t *testing.T) {
2929
cortex := e2ecortex.NewSingleBinary("cortex-1", flags, "", 9009)
3030
require.NoError(t, s.StartAndWaitReady(cortex))
3131

32-
c, err := e2ecortex.NewClient(cortex.Endpoint(9009), cortex.Endpoint(9009), "user-1")
32+
c, err := e2ecortex.NewClient(cortex.Endpoint(9009), cortex.Endpoint(9009), "", "user-1")
3333
require.NoError(t, err)
3434

3535
// Push some series to Cortex.

integration/ingester_flush_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ func TestIngesterFlushWithChunksStorage(t *testing.T) {
4646
require.NoError(t, distributor.WaitSumMetric("cortex_ring_tokens_total", 512))
4747
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))
4848

49-
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
49+
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
5050
require.NoError(t, err)
5151

5252
// Push some series to Cortex.

integration/ingester_hand_over_test.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ func runIngesterHandOverTest(t *testing.T, flags map[string]string, setup func(t
5656
require.NoError(t, distributor.WaitSumMetric("cortex_ring_tokens_total", 512))
5757
require.NoError(t, querier.WaitSumMetric("cortex_ring_tokens_total", 512))
5858

59-
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "user-1")
59+
c, err := e2ecortex.NewClient(distributor.Endpoint(80), querier.Endpoint(80), "", "user-1")
6060
require.NoError(t, err)
6161

6262
// Push some series to Cortex.

pkg/alertmanager/alertmanager.go

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,8 +191,15 @@ func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config) error {
191191

192192
am.api.Update(conf, func(_ model.LabelSet) {})
193193

194-
am.inhibitor.Stop()
195-
am.dispatcher.Stop()
194+
// Ensure inhibitor is set before being called
195+
if am.inhibitor != nil {
196+
am.inhibitor.Stop()
197+
}
198+
199+
// Ensure dispatcher is set before being called
200+
if am.dispatcher != nil {
201+
am.dispatcher.Stop()
202+
}
196203

197204
am.inhibitor = inhibit.NewInhibitor(am.alerts, conf.InhibitRules, am.marker, log.With(am.logger, "component", "inhibitor"))
198205

0 commit comments

Comments
 (0)