Skip to content

Commit a22e338

Browse files
authored
Merge branch 'master' into fix/store-gateway-series-stream-ctx-cleanup
Signed-off-by: Friedrich Gonzalez <1517449+friedrichg@users.noreply.github.com>
2 parents a21b994 + aa717c3 commit a22e338

57 files changed

Lines changed: 3856 additions & 1428 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
* [FEATURE] Querier: Implement Resource Based Throttling in Querier. #7442
1313
* [ENHANCEMENT] Upgrade prometheus alertmanager version to v0.32.1. #7462
1414
* [ENHANCEMENT] Tenant Federation: Avoid purging the regex resolver LRU cache on user-sync ticks when the set of known users has not changed. #7489
15+
* [ENHANCEMENT] Memberlist: Add `-memberlist.packet-read-timeout`, `-memberlist.max-packet-size`, and `-memberlist.max-concurrent-connections` flags to bound inbound gossip TCP connections, preventing slow-read, OOM, and connection-flood attacks on the gossip port. #7518
1516
* [ENHANCEMENT] Parquet Converter: Add a ring status page to expose the ring status. #7455
1617
* [ENHANCEMENT] Ingester: Add WAL record metrics to help evaluate the effectiveness of WAL compression type (e.g. snappy, zstd): `cortex_ingester_tsdb_wal_record_part_writes_total`, `cortex_ingester_tsdb_wal_record_parts_bytes_written_total`, and `cortex_ingester_tsdb_wal_record_bytes_saved_total`. #7420
1718
* [ENHANCEMENT] Distributor: Introduce dynamic `Symbols` slice capacity pooling. #7398 #7401
@@ -26,6 +27,10 @@
2627
* [ENHANCEMENT] Distributor: Add HMAC-SHA256 stream authentication for `PushStream` via `-distributor.sign-write-requests-keys`. #7475
2728
* [ENHANCEMENT] Instrument Ingester CPU profile with source for read APIs. #7494
2829
* [ENHANCEMENT] Ingester: Convert expanded postings cache from FIFO to LRU eviction to retain frequently-queried entries under memory pressure. #7510
30+
* [ENHANCEMENT] Querier: Detach series label and chunk data from gRPC unmarshal buffers in store-gateway streaming path, allowing the Go GC to reclaim receive buffers. #7519
31+
* [ENHANCEMENT] Distributor: Added `cortex_distributor_received_histogram_buckets` metric to track number of buckets in received native histogram samples before validation, per user. #7569
32+
* [ENHANCEMENT] Distributor: Add `WrappedHistogram` with configurable size limit (`-validation.max-native-histogram-size-bytes`) to cap native histogram protobuf size before unmarshalling. #7570
33+
* [ENHANCEMENT] Ingester: Add lazy regex evaluation on head postings cache miss. Defers expensive regex matchers on high-cardinality labels to per-series filtering when a selective equality matcher already narrows the result set. Configured via `-blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality` (disabled by default). #7553
2934
* [BUGFIX] Querier: Fix queryWithRetry and labelsWithRetry returning (nil, nil) on cancelled context by propagating ctx.Err(). #7370
3035
* [BUGFIX] Metrics Helper: Fix non-deterministic bucket order in merged histograms by sorting buckets after map iteration, matching Prometheus client library behavior. #7380
3136
* [BUGFIX] Distributor: Return HTTP 401 Unauthorized when tenant ID resolution fails in the Prometheus Remote Write 2.0 path. #7389
@@ -38,6 +43,12 @@
3843
* [BUGFIX] Security: Fix stored XSS vulnerability in Alertmanager and Store Gateway status pages by replacing `text/template` with `html/template`. #7512
3944
* [BUGFIX] Security: Limit decompressed gzip output in `ParseProtoReader` and OTLP ingestion path. The decompressed body is now capped by `-distributor.otlp-max-recv-msg-size`. #7515
4045
* [BUGFIX] Querier: Release each store-gateway `Series` gRPC stream's resources as soon as its goroutine returns, instead of holding them until the slowest concurrent store-gateway request in the same query finishes. #7576
46+
* [BUGFIX] Ingester: Close TSDB when compaction fails during `createTSDB`, preventing resource leaks (file descriptors, mmap handles) that could lead to ingester instability. #7560
47+
* [BUGFIX] Tenant Federation: Fix regex resolver clearing known users list when user scan fails. #7534
48+
* [BUGFIX] Ingester: Release the TSDB appender on every early-return path in `Push` (e.g. out-of-order label set) by deferring `Rollback`. Previously such requests leaked TSDB head series references, mmap'd chunks and pending state per request, causing the `cortex_ingester_tsdb_head_active_appenders` gauge to grow unbounded. #7528
49+
* [BUGFIX] Ring: Fix ring token conflict resolution only applied to updated instance and make constantly token conflict check during instance observe period.
50+
* [BUGFIX] Distributor: Fix a panic (`slice bounds out of range`) in the stream push path when the context deadline expires while the worker goroutine is still marshalling a `WriteRequest`. #7541
51+
* [BUGFIX] Query Frontend: Fix native histogram responses not being handled correctly in `minTime()` sort ordering for split_by_interval merge. #7555
4152

4253
## 1.21.0 2026-04-24
4354

MAINTAINERS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,5 @@
1616
|-----------------|----------------------------|-----------------|---------------------|
1717
| Anand Rajagopal | anand.rajagopal@icloud.com | @rajagopalanand | Amazon Web Services |
1818
| Daniel Sabsay | danielsabsay.sofware@gmail.com | @dsabsay | Adobe |
19+
| Yuxuan Chen | sandy19890604@gmail.com | @sandy2008 | Morgan Stanley |
20+

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ check-protos: clean-protos protos
243243
@git diff --exit-code -- $(PROTO_GOS)
244244

245245
modernize:
246-
GOTOOLCHAIN=auto go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@v0.21.0 -fix ./...
246+
GOTOOLCHAIN=auto go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@v0.22.0 -fix ./...
247247

248248
# Generates the config file documentation.
249249
doc: clean-doc

docs/blocks-storage/querier.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1970,6 +1970,25 @@ blocks_storage:
19701970
# CLI flag: -blocks-storage.expanded_postings_cache.block.fetch-timeout
19711971
[fetch_timeout: <duration> | default = 0s]
19721972

1973+
# [EXPERIMENTAL] Maximum label cardinality for deferring regex matchers on
1974+
# the head block. When a regex matcher targets a label with more unique
1975+
# values than this threshold, it is applied lazily during iteration
1976+
# instead of postings lookup. 0 disables.
1977+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality
1978+
[lazy_matcher_max_cardinality: <int> | default = 0]
1979+
1980+
# [EXPERIMENTAL] Cardinality:postings ratio above which a simple regex
1981+
# (prefix-only, single contains) is deferred to lazy iteration. Lower =
1982+
# more aggressive deferral. Calibrated empirically; defaults to 6.
1983+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio
1984+
[lazy_matcher_simple_cost_ratio: <int> | default = 6]
1985+
1986+
# [EXPERIMENTAL] Cardinality:postings ratio above which a complex regex
1987+
# (multi-substring, capture groups, character classes) is deferred. Lower
1988+
# = more aggressive deferral. Calibrated empirically; defaults to 2.
1989+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio
1990+
[lazy_matcher_complex_cost_ratio: <int> | default = 2]
1991+
19731992
users_scanner:
19741993
# Strategy to use to scan users. Supported values are: list, user_index.
19751994
# CLI flag: -blocks-storage.users-scanner.strategy

docs/blocks-storage/store-gateway.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2028,6 +2028,25 @@ blocks_storage:
20282028
# CLI flag: -blocks-storage.expanded_postings_cache.block.fetch-timeout
20292029
[fetch_timeout: <duration> | default = 0s]
20302030

2031+
# [EXPERIMENTAL] Maximum label cardinality for deferring regex matchers on
2032+
# the head block. When a regex matcher targets a label with more unique
2033+
# values than this threshold, it is applied lazily during iteration
2034+
# instead of postings lookup. 0 disables.
2035+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality
2036+
[lazy_matcher_max_cardinality: <int> | default = 0]
2037+
2038+
# [EXPERIMENTAL] Cardinality:postings ratio above which a simple regex
2039+
# (prefix-only, single contains) is deferred to lazy iteration. Lower =
2040+
# more aggressive deferral. Calibrated empirically; defaults to 6.
2041+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio
2042+
[lazy_matcher_simple_cost_ratio: <int> | default = 6]
2043+
2044+
# [EXPERIMENTAL] Cardinality:postings ratio above which a complex regex
2045+
# (multi-substring, capture groups, character classes) is deferred. Lower
2046+
# = more aggressive deferral. Calibrated empirically; defaults to 2.
2047+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio
2048+
[lazy_matcher_complex_cost_ratio: <int> | default = 2]
2049+
20312050
users_scanner:
20322051
# Strategy to use to scan users. Supported values are: list, user_index.
20332052
# CLI flag: -blocks-storage.users-scanner.strategy

docs/configuration/config-file-reference.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2650,6 +2650,25 @@ tsdb:
26502650
# CLI flag: -blocks-storage.expanded_postings_cache.block.fetch-timeout
26512651
[fetch_timeout: <duration> | default = 0s]
26522652

2653+
# [EXPERIMENTAL] Maximum label cardinality for deferring regex matchers on
2654+
# the head block. When a regex matcher targets a label with more unique
2655+
# values than this threshold, it is applied lazily during iteration instead
2656+
# of postings lookup. 0 disables.
2657+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality
2658+
[lazy_matcher_max_cardinality: <int> | default = 0]
2659+
2660+
# [EXPERIMENTAL] Cardinality:postings ratio above which a simple regex
2661+
# (prefix-only, single contains) is deferred to lazy iteration. Lower = more
2662+
# aggressive deferral. Calibrated empirically; defaults to 6.
2663+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio
2664+
[lazy_matcher_simple_cost_ratio: <int> | default = 6]
2665+
2666+
# [EXPERIMENTAL] Cardinality:postings ratio above which a complex regex
2667+
# (multi-substring, capture groups, character classes) is deferred. Lower =
2668+
# more aggressive deferral. Calibrated empirically; defaults to 2.
2669+
# CLI flag: -blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio
2670+
[lazy_matcher_complex_cost_ratio: <int> | default = 2]
2671+
26532672
users_scanner:
26542673
# Strategy to use to scan users. Supported values are: list, user_index.
26552674
# CLI flag: -blocks-storage.users-scanner.strategy
@@ -4693,6 +4712,18 @@ The `memberlist_config` configures the Gossip memberlist.
46934712
# CLI flag: -memberlist.packet-write-timeout
46944713
[packet_write_timeout: <duration> | default = 5s]
46954714
4715+
# Timeout for reading packet data from inbound connections. 0 = no limit.
4716+
# CLI flag: -memberlist.packet-read-timeout
4717+
[packet_read_timeout: <duration> | default = 5s]
4718+
4719+
# Maximum size in bytes of an inbound gossip packet. 0 = no limit.
4720+
# CLI flag: -memberlist.max-packet-size
4721+
[max_packet_size: <int> | default = 1048576]
4722+
4723+
# Maximum number of concurrent inbound TCP connections. 0 = no limit.
4724+
# CLI flag: -memberlist.max-concurrent-connections
4725+
[max_concurrent_connections: <int> | default = 100]
4726+
46964727
# Enable TLS on the memberlist transport layer.
46974728
# CLI flag: -memberlist.tls-enabled
46984729
[tls_enabled: <boolean> | default = false]

docs/configuration/v1-guarantees.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,3 +133,7 @@ Currently experimental features are:
133133
- Ingester: Active Series Tracker
134134
- Per-tenant `active_series_trackers` configuration in runtime config overrides
135135
- Counts active series matching PromQL label matchers and exposes `cortex_ingester_active_series_per_tracker` metric
136+
- Ingester: Lazy regex evaluation on head postings cache miss
137+
- `-blocks-storage.expanded_postings_cache.head.lazy-matcher-max-cardinality` (int) CLI flag
138+
- `-blocks-storage.expanded_postings_cache.head.lazy-matcher-simple-cost-ratio` (int) CLI flag
139+
- `-blocks-storage.expanded_postings_cache.head.lazy-matcher-complex-cost-ratio` (int) CLI flag
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: "Supported architectures"
3+
linkTitle: "Supported architectures"
4+
weight: 16
5+
---
6+
7+
Cortex release artifacts are built for the operating systems and architectures
8+
listed below. Use these targets when selecting container images, binary
9+
artifacts, or OS packages for production deployments.
10+
11+
## Supported release targets
12+
13+
| Artifact type | Operating system | Architectures |
14+
|---------------|------------------|---------------|
15+
| Container images | Linux | `amd64`, `arm64` |
16+
| Cortex binary | Linux | `amd64`, `arm64` |
17+
| Cortex binary | darwin (macOS) | `amd64`, `arm64` |
18+
| `query-tee` binary | Linux | `amd64`, `arm64` |
19+
| `query-tee` binary | darwin (macOS) | `amd64`, `arm64` |
20+
| Debian package | Linux | `amd64`, `arm64` |
21+
| RPM package | Linux | `amd64`, `arm64` |
22+
23+
The CI and release pipelines build Cortex with `GOOS` and `GOARCH` targets
24+
matching these rows. Automated tests run against Linux `amd64` and `arm64`
25+
Cortex images.
26+
27+
## Unsupported targets
28+
29+
Other operating systems or architectures may work when built from source, but
30+
they are not part of the regular Cortex release artifacts or CI matrix. Treat
31+
those builds as unsupported unless you validate them in your own environment.
32+
33+
Cortex does not require architecture-specific CPU extensions such as AVX in its
34+
release build configuration. If you use external services or custom base images
35+
alongside Cortex, verify their architecture and CPU requirements separately.

integration/e2e/scenario.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package e2e
33
import (
44
"fmt"
55
"os"
6+
"slices"
67
"strings"
78
"sync"
89

@@ -151,14 +152,14 @@ func (s *Scenario) shutdown() {
151152
// Kill the services in parallel. We still iterate in reverse order
152153
// to respect service dependencies, but we kill them concurrently.
153154
var wg sync.WaitGroup
154-
for i := len(s.services) - 1; i >= 0; i-- {
155+
for _, v := range slices.Backward(s.services) {
155156
wg.Add(1)
156157
go func(service Service) {
157158
defer wg.Done()
158159
if err := service.Kill(); err != nil {
159160
logger.Log("Unable to kill service", service.Name(), ":", err.Error())
160161
}
161-
}(s.services[i])
162+
}(v)
162163
}
163164
wg.Wait()
164165

integration/parquet_querier_test.go

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ package integration
55
import (
66
"context"
77
"fmt"
8-
"math/rand"
98
"path/filepath"
109
"slices"
1110
"strconv"
@@ -89,7 +88,7 @@ func TestParquetFuzz(t *testing.T) {
8988
require.NoError(t, writeFileToSharedDir(s, "alertmanager_configs", []byte{}))
9089

9190
ctx := context.Background()
92-
rnd := rand.New(rand.NewSource(time.Now().Unix()))
91+
rnd := newFuzzRand(t)
9392
dir := filepath.Join(s.SharedDir(), "data")
9493
numSeries := 10
9594
numSamples := 60
@@ -172,6 +171,7 @@ func TestParquetFuzz(t *testing.T) {
172171
opts := []promqlsmith.Option{
173172
// @ modifier and offset disabled: known bug in Prometheus (e.g. predict_linear with @/offset can panic).
174173
promqlsmith.WithEnabledFunctions(enabledFunctions),
174+
promqlsmith.WithEnabledAggrs(enabledAggrs),
175175
}
176176
ps := promqlsmith.New(rnd, lbls, opts...)
177177

@@ -240,7 +240,7 @@ func TestParquetProjectionPushdownFuzz(t *testing.T) {
240240
require.NoError(t, writeFileToSharedDir(s, "alertmanager_configs", []byte{}))
241241

242242
ctx := context.Background()
243-
rnd := rand.New(rand.NewSource(time.Now().Unix()))
243+
rnd := newFuzzRand(t)
244244
dir := filepath.Join(s.SharedDir(), "data")
245245
numSeries := 20
246246
numSamples := 100

0 commit comments

Comments
 (0)