Add shuffle sharding grouper/planner #4357

ac1214 · 2021-07-10T02:29:26Z

What this PR does:

Implements generation of parallelize plans for the proposal outlined in #4272 using a shuffle sharding grouper and planner. Currently the parallelizable plans are generated but every compactor runs every planned compaction, the actual sharding will happen in a subsequent PR.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

ac1214 · 2021-07-10T02:37:28Z

I wanted to discuss a change in the block compaction behavior that this PR would introduce.

The current implementation of the Thanos compactor will always compact the first set of overlapping blocks if there exists such a set. This means that if the most recently ingested set of blocks from multiple ingesters are overlapping, the blocks will be compacted. If this happens, there is potentially a “missing” block in the compaction with the Thanos planner since there may be an ingester that has not fully uploaded the block when the compaction begins. So if there are 3 overlapping blocks when the compaction begins, and they are the latest blocks passed to the Thanos planner, the planner will plan a compaction of those 3 blocks even if there is a potential fourth ingester that has yet to upload a block.

With this PR, overlapping blocks will not be compacted if they are the last set of blocks meaning in the example above, the 3 blocks won’t be compacted if there are the latest ones and they don’t cover a full range.

In a real-world situation, this would only have an impact on customers who stop ingesting blocks. The impact will be that the last group of n blocks, where n is the number of ingesters will remain uncompacted for as long as they are the latest blocks. The impact of leaving the last n blocks uncompacted would be increased storage size as well as query time (if they continue to query even after stopping ingesting blocks). One thing to note with the Thanos approach, there can be duplicate work if the blocks are compacted, and another block that overlaps is uploaded after the compaction begins.

A couple of different approaches I considered were adding grouping overlapping blocks before grouping by compactable ranges, this results in the compaction behavior being the same using these changes compared to Thanos. Another approach is if there are no new blocks after the time defined by the smallest block range passes from the max time of all the blocks, the block which are overlapping can be compacted, even if they are the latest blocks.

Something else that I considered is making this a toggle to allow the user to define their own preference, but I think that this isn't ideal as it would either lead to having to support the toggle indefinitely or eventually having to have users switch to a single behavior.

Small example illustrating what’s mentioned above

4 total blocks with 1 block incoming (not yet uploaded)

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2: {
  MinTime: 21, MaxTime: 40
}
block 3: {
  MinTime: 21, MaxTime: 40
}
block 4: {
  MinTime: 21, MaxTime: 40
}
block 5 (incoming): {
  MinTime: 21, MaxTime: 40
}

Thanos compaction

The above blocks with the current (Thanos) compaction with time ranges [20, 120, 240] would result in blocks:

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}

Afterwards, once block 5 is fully uploaded the final resulting blocks from a single run of the compaction will be

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
block 5: {
  MinTime: 21, MaxTime: 40
}

With these blocks, another compaction will need to be done to fully compact the overlapping blocks 2-5.

New compaction behavior

With this PR and the shuffle-sharding strategy, the blocks would remain uncompacted. And would wait until a more recent block than 2-5 is uploaded. Once that block is uploaded blocks 2-5 would be impacted in 1 compaction.
If there is a block 6 uploaded with MinTime: 41, MaxTime: 60 after block 5 is fully uploaded, then the resulting blocks after a single compaction would be.

block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4/5: {
  MinTime: 21, MaxTime: 40
}
block 6: {
  MinTime: 41, MaxTime: 60
}

The downside with this approach is that the uncompacted blocks 2-5 were stored for a longer time compared to the current (Thanos) approach as it was waiting for a more recent block to be uploaded before compacting the blocks. In the above with this PR if block 6 didn't exist, then blocks 2-5 would never be compacted as they would remain as the most recent blocks.

I was wondering what your thoughts were about which approach would be preferable?

ac1214 · 2021-07-14T02:15:44Z

This PR replaces and implements the changes recommended in #4318

ac1214 · 2021-07-15T14:40:10Z

I wanted to discuss a change in the block compaction behavior that this PR would introduce.

The current implementation of the Thanos compactor will always compact the first set of overlapping blocks if there exists such a set. This means that if the most recently ingested set of blocks from multiple ingesters are overlapping, the blocks will be compacted. If this happens, there is potentially a “missing” block in the compaction with the Thanos planner since there may be an ingester that has not fully uploaded the block when the compaction begins. So if there are 3 overlapping blocks when the compaction begins, and they are the latest blocks passed to the Thanos planner, the planner will plan a compaction of those 3 blocks even if there is a potential fourth ingester that has yet to upload a block.

With this PR, overlapping blocks will not be compacted if they are the last set of blocks meaning in the example above, the 3 blocks won’t be compacted if there are the latest ones and they don’t cover a full range.

In a real-world situation, this would only have an impact on customers who stop ingesting blocks. The impact will be that the last group of n blocks, where n is the number of ingesters will remain uncompacted for as long as they are the latest blocks. The impact of leaving the last n blocks uncompacted would be increased storage size as well as query time (if they continue to query even after stopping ingesting blocks). One thing to note with the Thanos approach, there can be duplicate work if the blocks are compacted, and another block that overlaps is uploaded after the compaction begins.

A couple of different approaches I considered were adding grouping overlapping blocks before grouping by compactable ranges, this results in the compaction behavior being the same using these changes compared to Thanos. Another approach is if there are no new blocks after the time defined by the smallest block range passes from the max time of all the blocks, the block which are overlapping can be compacted, even if they are the latest blocks.

Something else that I considered is making this a toggle to allow the user to define their own preference, but I think that this isn't ideal as it would either lead to having to support the toggle indefinitely or eventually having to have users switch to a single behavior.

Small example illustrating what’s mentioned above

4 total blocks with 1 block incoming (not yet uploaded)
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2: {
  MinTime: 21, MaxTime: 40
}
block 3: {
  MinTime: 21, MaxTime: 40
}
block 4: {
  MinTime: 21, MaxTime: 40
}
block 5 (incoming): {
  MinTime: 21, MaxTime: 40
}
Thanos compaction

The above blocks with the current (Thanos) compaction with time ranges [20, 120, 240] would result in blocks:
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
Afterwards, once block 5 is fully uploaded the final resulting blocks from a single run of the compaction will be
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4: {
  MinTime: 21, MaxTime: 40
}
block 5: {
  MinTime: 21, MaxTime: 40
}
With these blocks, another compaction will need to be done to fully compact the overlapping blocks 2-5.

New compaction behavior

With this PR and the shuffle-sharding strategy, the blocks would remain uncompacted. And would wait until a more recent block than 2-5 is uploaded. Once that block is uploaded blocks 2-5 would be impacted in 1 compaction.
If there is a block 6 uploaded with MinTime: 41, MaxTime: 60 after block 5 is fully uploaded, then the resulting blocks after a single compaction would be.
block 1: {
  MinTime: 0, MaxTime: 20
}
block 2/3/4/5: {
  MinTime: 21, MaxTime: 40
}
block 6: {
  MinTime: 41, MaxTime: 60
}
The downside with this approach is that the uncompacted blocks 2-5 were stored for a longer time compared to the current (Thanos) approach as it was waiting for a more recent block to be uploaded before compacting the blocks. In the above with this PR if block 6 didn't exist, then blocks 2-5 would never be compacted as they would remain as the most recent blocks.

I was wondering what your thoughts were about which approach would be preferable?

Discussed in the community call and leaving the blocks uncompacted is okay.

jeromeinsf · 2021-09-03T19:55:52Z

@pracucci and/or @bboreham how can we help with the review of this?

bboreham

The code is long and I didn't read through every line. Broadly it looks ok.
I did wonder why the word "thanos" shows up so often - if the code is copied from Thanos it should say so, and if not can you just explain your thinking to me?

CHANGELOG.md

bboreham · 2021-09-15T14:28:29Z

pkg/compactor/shuffle_sharding_grouper.go

+		garbageCollectedBlocks:   garbageCollectedBlocks,
+		hashFunc:                 hashFunc,
+		compactions: promauto.With(reg).NewCounterVec(prometheus.CounterOpts{
+			Name: "thanos_compact_group_compactions_total",


Do we want to add new metrics in Cortex starting "thanos_"?

Added a note where the metrics were copied from in Thanos. With these changes wouldn't the metrics in Cortex remain the same? They are only used when creating a new group using compact.NewGroup which is what is being done now (https://github.com/cortexproject/cortex/blob/master/vendor/github.com/thanos-io/thanos/pkg/compact/compact.go#L262-L312)

Then it might be better to expose those metrics as another function in Thanos?

thejosephstevens · 2021-11-04T02:59:20Z

Any way I can help with this PR? We're running into limits in our compaction (we have about 25M active time series in a single-tenant cortex). I'd be happy to run pre-release compactor builds if this needs some kind of validation.

alvinlin123 · 2022-01-14T01:03:27Z

error message from build is

go: updates to go.mod needed, disabled by -mod=vendor
	(Go version in go.mod is at least 1.14 and vendor directory exists.)
	to update it:
	go mod tidy
make: *** [Makefile:164: cmd/blocksconvert/blocksconvert] Error 1

Since this looks like a useful PR that we want to merge, but I don't won the original branch, I will create a new branch to work on resolving the error.

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…ortexproject#4262) * add MaxRetries to WaitInstanceState Signed-off-by: Albert <[email protected]> * update CHANGELOG.md Signed-off-by: Albert <[email protected]> * Add timeout for waiting on compactor to become ACTIVE in the ring. Signed-off-by: Albert <[email protected]> * add MaxRetries variable back to WaitInstanceState Signed-off-by: Albert <[email protected]> * Fix linting issues Signed-off-by: Albert <[email protected]> * Remove duplicate entry from changelog Signed-off-by: Albert <[email protected]> * Address PR comments and set timeout to be configurable Signed-off-by: Albert <[email protected]> * Address PR comments and fix tests Signed-off-by: Albert <[email protected]> * Update unit tests Signed-off-by: Albert <[email protected]> * Update changelog and fix linting Signed-off-by: Albert <[email protected]> * Fixed CHANGELOG entry order Signed-off-by: Marco Pracucci <[email protected]> Co-authored-by: Albert <[email protected]> Co-authored-by: Marco Pracucci <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* MergeIterator: allocate less memory at first We were allocating 24x the number of streams of batches, where each batch holds up to 12 samples. By allowing `c.batches` to reallocate when needed, we avoid the need to pre-allocate enough memory for all possible scenarios. * chunk_test: fix innacurate end time on chunks The `through` time is supposed to be the last time in the chunk, and having it one step higher was throwing off other tests and benchmarks. * MergeIterator benchmark: add more realistic sizes At 15-second scrape intervals a chunk covers 30 minutes, so 1,000 chunks is about three weeks, a highly un-representative test. Instant queries, such as those done by the ruler, will only fetch one chunk from each ingester. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Expose default configuration values for memberlist. Set the defaults for various memberlist configuration values based on the "Default LAN" configuration. The only result of this change is that the defaults are now visible and are in the documentation. This also means that if the default values change, then the changes are visible in the documentation, where as before they would have gone unnoticed. To prevent this being a breaking change, the existing behaviour is retained, in case anyone is explicitly setting the values to zero and expecting the default to be used. Signed-off-by: Steve Simpson <[email protected]> * Remove use of zero value as default value indicator. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

cortexproject#4342) * Allow setting ring heartbeat timeout to zero to disable timeout check. This change allows the various ring heartbeat timeouts to be configured with zero, as a means of disabling the timeout. This is expected to be used with a separate enhancement to allow disabling heartbeats. When the heartbeat timeout is disabled, instances will always appear as healthy in the ring. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…time. (cortexproject#4317) * Add a new config and metric for reporting ruler query execution wall time. Signed-off-by: Tyler Reid <[email protected]> * Spacing and PR number fixup Signed-off-by: Tyler Reid <[email protected]> * Wrap the defer in a function to make it defer after the return rather than after the if block. Add a unit test to validate we're tracking time correctly. Signed-off-by: Tyler Reid <[email protected]> * Use seconds for our duration rather than nanoseconds Signed-off-by: Tyler Reid <[email protected]> * Review comment fixes Signed-off-by: Tyler Reid <[email protected]> * Update config flag in the config docs Signed-off-by: Tyler Reid <[email protected]> * Pass counter rather than counter vector for metrics query function Signed-off-by: Tyler Reid <[email protected]> * Fix comment in MetricsQueryFunction Signed-off-by: Tyler Reid <[email protected]> * Move query metric and log to separate function. Add log message for ruler query time. Signed-off-by: Tyler Reid <[email protected]> * Update config file and change log to show this a per user metric Signed-off-by: Tyler Reid <[email protected]> * code review fixes Signed-off-by: Tyler Reid <[email protected]> * update log message for ruler query metrics Signed-off-by: Tyler Reid <[email protected]> * Remove append and just use the array for key values in the log messag Signed-off-by: Tyler Reid <[email protected]> * Add query-frontend component to front end log message Signed-off-by: Tyler Reid <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

I thought it would be good to put a security page into the docs, so that it shows up in a search. Content is just pointing at other resources. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…xproject#4345) * Optimise memberlist kv store access by storing data unencoded. The following profile data was taken from running 50 idle ingesters with memberlist, with almost everything at default values (5s heartbeats): ``` 52.16% mergeBytesValueForKey +- 52.16% mergeValueForKey +- 47.84% computeNewValue +- 27.24% codec Proto Decode +- 26.25% mergeWithTime ``` It is apparent from the this that a lot of time is spent on the memberlist receive path, as might be expected, specifically, the merging of the update into the current state. The cost however is not in decoding the incoming states (occurs in `mergeBytesValueForKey` before `mergeValueForKey`), but in fact decoding _current state_ of the value in the store (as it is stored encoded). The ring state was measured at 123K (50 ingesters), so it makes sense that decoding could be costly. This can be avoided by storing the value in it's decoded `Mergeable` form. When doing this, care has to be taken to deep copy the value when accessed, as it is modified in place before being updated in the store, and accessed outside the store mutex. Note a side effect of this change is that is no longer straightforward to expose the `memberlist_kv_store_value_bytes` metric, as this reported the size of the encoded data, therefore it has been removed. Signed-off-by: Steve Simpson <[email protected]> * Typo. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…o. (cortexproject#4344) * Allow disabling of ring heartbeats by setting relevant options to zero. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…#4346) * Expose configuration of memberlist packet compression. Allows manually specifying whether memberlist should compress packets via a new configuration flag: `-memberlist.enable-compression`. This typically has little benefit for Cortex, as the ring state messages are already compressed with Snappy, the second layer of compression does not achieve any additional saving. It's not clear cut whether there might still be some benefit for internal memberlist messages; this needs to be evaluated in a environment of some reasonable scale. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> * Review comments. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…exproject#4348) It was only waiting one second for the second sync to complete, which is probably too harsh a deadline than necessary for overloaded systems. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…xproject#4349) The test is writing a single silence and checking a metric which indicates whether replicating the silence has been attempted yet. This is so we can check later on that no replication activity occurs. The assertions later on in the test are passing, but the first one is not, indicating that the replication doesn't trigger early enough. This makes sense because the replication is not synchronous with the writing of the silence. Signed-off-by: Steve Simpson <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

) * Add proposal document Signed-off-by: Gofman <[email protected]> Signed-off-by: ilangofman <[email protected]> * Minor text modifications Signed-off-by: ilangofman <[email protected]> * Implement requested changes to the proposal Signed-off-by: ilangofman <[email protected]> * Fix mention of Compactor instead of purger in proposal Signed-off-by: ilangofman <[email protected]> * Fixed wording and spelling in proposal Signed-off-by: ilangofman <[email protected]> * Update the cache invalidation method Signed-off-by: ilangofman <[email protected]> * Fix wording on cache invalidation section Signed-off-by: ilangofman <[email protected]> * Minor wording additions Signed-off-by: ilangofman <[email protected]> * Remove white-noise from text Signed-off-by: ilangofman <[email protected]> * Remove the deleting state and change cache invalidation Signed-off-by: ilangofman <[email protected]> * Add deleted state and update cache invalidation Signed-off-by: ilangofman <[email protected]> * Add one word to clear things up Signed-off-by: ilangofman <[email protected]> * update api limits section Signed-off-by: ilangofman <[email protected]> * ran clean white noise Signed-off-by: ilangofman <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Conventionally the minimum time would be before the maximum. Apparently none of the tests were depending on this. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

We need to add the merged value back to the map. Extract merging as a separate function so it can be tested. Adapt the existing test to cover multiple series. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Rearrange `CHANGELOG.md` to conform to instructions in `pull_request_template.md`. Also add a `-` to a CLI flag to conform to instructions in `design-patterns-and-conventions.md`. Signed-off-by: Andrew Seigner <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Introduce `http` config settings in Azure storage Cortex v1.11.0 included thanos-io/thanos#3970, which added configuration options to Azure's http client and transport, replacing usage of `http.DefaultClient`. Unfortunately since Cortex was not setting this config, Cortex implicitly switched from `http.DefaultClient` to all empty values (e.g. `MaxIdleConns: 0` rather than 100). Introduce `http` config settings to Azure storage. This motivated moving `s3.HTTPConfig` into a new `pkg/storage/bucket/config` package, to allow `azure` and `s3` to share it. Also update the instructions for running the website to include installing `embedmd`. Signed-off-by: Andrew Seigner <[email protected]> * feedback: `config.HTTP` -> `http.Config` also back out changelog cleanup Signed-off-by: Andrew Seigner <[email protected]> * Back out accidental changelog addition Signed-off-by: Andrew Seigner <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Update Thanos to latest main Update Thanos dependency to include thanos-io/thanos#4928, to conserve memory. Signed-off-by: Andrew Seigner <[email protected]> * Update changelog to summarize user-facing changes Signed-off-by: Andrew Seigner <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Adding test case for dropping metrics by name to understand better flow of distributor Signed-off-by: Pedro Tanaka <[email protected]> * Adding test case and new metric for dropped samples Signed-off-by: Pedro Tanaka <[email protected]> * Updating CHANGELOG with new changes Signed-off-by: Pedro Tanaka <[email protected]> * Fixing linting problem on distributor file Signed-off-by: Pedro Tanaka <[email protected]> * Reusing discarded samples metric from validate package Signed-off-by: Pedro Tanaka <[email protected]> * Compare labelset with len() instead of comparing to nil Signed-off-by: Pedro Tanaka <[email protected]> * Undoing unnecessary changes on tests and distributor Signed-off-by: Pedro Tanaka <[email protected]> * Small rename on comment Signed-off-by: Pedro Tanaka <[email protected]> * Fixing linting offenses Signed-off-by: Pedro Tanaka <[email protected]> * Reseting validation dropped samples metric to avoid getting metrics from other test runs Signed-off-by: Pedro Tanaka <[email protected]> * Resolving problems after rebase conflicts Signed-off-by: Pedro Tanaka <[email protected]> * Registering counter for dropped metrics in test Signed-off-by: Pedro Tanaka <[email protected]> * Checking if user label drop configuration did not drop __name__ label Signed-off-by: Pedro Tanaka <[email protected]> * Do not check for name label, adding new test Signed-off-by: Pedro Tanaka <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Disable block deletion marks migration by default Flag is named `-compactor.block-deletion-marks-migration-enabled`. This feature was added in v1.7, so we expect most users to have upgraded by now. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Signed-off-by: Julien Pivotto <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…ct#4602) * Upgrade Go to 1.17.5 for integration tests Signed-off-by: Arve Knudsen <[email protected]> * Upgrade to Go 1.17 in Dockerfiles Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Update build image. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

This reverts commit f2656f8. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…#4440)" (cortexproject#4613) This reverts commit a635a1e. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

* Federated ruler proposal Signed-off-by: Rees Dooley <[email protected]> Co-authored-by: Rees Dooley <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

) This reverts commit 19f3802. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…exproject#4614) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…er (cortexproject#4615) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…t#4617) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

)" (cortexproject#4611) This reverts commit 32b1b40. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

…project#4619) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Move the change log line to unreleased section Signed-off-by: Alvin Lin <[email protected]>

Signed-off-by: Alvin Lin <[email protected]>

alvinlin123 · 2022-01-14T22:21:56Z

Please see #4624 instead.

pull-request-size bot added the size/XL label Jul 10, 2021

pull-request-size bot added size/XXL and removed size/XL labels Jul 13, 2021

ac1214 mentioned this pull request Jul 23, 2021

Add shuffle sharding for compactor ac1214/cortex#4

Open

3 tasks

ac1214 mentioned this pull request Aug 18, 2021

Add metrics for shuffle sharding #4432

Closed

3 tasks

bboreham approved these changes Sep 15, 2021

View reviewed changes

alvinlin123 approved these changes Jan 14, 2022

View reviewed changes

alvinlin123 mentioned this pull request Jan 14, 2022

Add shuffle sharding grouper/planner (Clone of PR 4357) #4621

Closed

3 tasks

ac1214 and others added 16 commits January 14, 2022 13:56

add shuffle sharding grouper/planner

28cf3b6

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

update CHANGELOG.md

366536a

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Add a security doc (cortexproject#4337)

9ceb7d6

I thought it would be good to put a security page into the docs, so that it shows up in a search. Content is just pointing at other resources. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

update changelog

b0365b7

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Add unit tests for shuffle sharding planner

205ed16

Signed-off-by: Albert <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

bboreham and others added 26 commits January 14, 2022 14:01

distributor-queryable tests: make time go forward (cortexproject#4561)

f12fe5f

Conventionally the minimum time would be before the maximum. Apparently none of the tests were depending on this. Signed-off-by: Bryan Boreham <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Add a note about remote read in HA Pair handling (cortexproject#4500)

92a1358

Signed-off-by: Julien Pivotto <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Rebuild and update build image from master. (cortexproject#4604)

03b423f

* Update build image. Signed-off-by: Peter Štibraný <[email protected]> * CHANGELOG.md Signed-off-by: Peter Štibraný <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Upgrade to dskit@01ce9286d7d5 (cortexproject#4601)

77e5a2b

Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Revert "Migrate to dskit/ring (cortexproject#4539)" (cortexproject#4606)

2199af9

This reverts commit f2656f8. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Revert "Update cortex to use runtime config from dskit (cortexproject…

dac80e5

…#4440)" (cortexproject#4613) This reverts commit a635a1e. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Federated ruler proposal (cortexproject#4477)

3cfbb16

* Federated ruler proposal Signed-off-by: Rees Dooley <[email protected]> Co-authored-by: Rees Dooley <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Revert "Chore: Use dskit/grpc* (cortexproject#4523)" (cortexproject#4612

aad9e9f

) This reverts commit 19f3802. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Reintroduce pkg/util/concurrency, in place of dskit/concurrency (cort…

6c86e9e

…exproject#4614) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Reintroduce pkg/util/limiter/rate_limiter.go, in place of dskit/limit…

d1edf7a

…er (cortexproject#4615) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Reintroduce pkg/util/modules, in place of dskit/modules (cortexprojec…

c95da7b

…t#4617) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Revert "Use kv package from github.com/grafana/dskit (cortexproject#4436

1d7281f

)" (cortexproject#4611) This reverts commit 32b1b40. Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Reintroduce pkg/util/middleware, in place of dskit/middleware (cortex…

b887d1a

…project#4619) Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Reintroduce pkg/util/test, in place of dskit/test (cortexproject#4618)

624863f

Signed-off-by: Arve Knudsen <[email protected]> Signed-off-by: Alvin Lin <[email protected]>

Update CHANGELOG.md

3760f2b

Move the change log line to unreleased section Signed-off-by: Alvin Lin <[email protected]>

Add missing parameter to compact.NewGroup

e36c332

Signed-off-by: Alvin Lin <[email protected]>

Add missing parameter when creating shuffle sharding grouper

82d1c12

Signed-off-by: Alvin Lin <[email protected]>

add missing argument

e458cbd

Signed-off-by: Alvin Lin <[email protected]>

fix up changelog

8e78a51

Signed-off-by: Alvin Lin <[email protected]>

alvinlin123 force-pushed the shuffle-sharding-compactor branch from db0595b to 8e78a51 Compare January 14, 2022 22:03

alvinlin123 closed this Jan 14, 2022

alvinlin123 mentioned this pull request Jan 14, 2022

Add shuffle sharding grouper/planner (Clone of PR 4357) #4624

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add shuffle sharding grouper/planner #4357

Add shuffle sharding grouper/planner #4357

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 commented Jul 14, 2021

ac1214 commented Jul 15, 2021

jeromeinsf commented Sep 3, 2021

bboreham left a comment

bboreham Sep 15, 2021

ac1214 Sep 23, 2021 •

edited

Loading

yeya24 Nov 6, 2021

thejosephstevens commented Nov 4, 2021

alvinlin123 commented Jan 14, 2022 •

edited

Loading

alvinlin123 commented Jan 14, 2022 •

edited

Loading

Add shuffle sharding grouper/planner #4357

Add shuffle sharding grouper/planner #4357

Conversation

ac1214 commented Jul 10, 2021 • edited Loading

ac1214 commented Jul 10, 2021 • edited Loading

ac1214 commented Jul 14, 2021

ac1214 commented Jul 15, 2021

jeromeinsf commented Sep 3, 2021

bboreham left a comment

Choose a reason for hiding this comment

bboreham Sep 15, 2021

Choose a reason for hiding this comment

ac1214 Sep 23, 2021 • edited Loading

Choose a reason for hiding this comment

yeya24 Nov 6, 2021

Choose a reason for hiding this comment

thejosephstevens commented Nov 4, 2021

alvinlin123 commented Jan 14, 2022 • edited Loading

alvinlin123 commented Jan 14, 2022 • edited Loading

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 commented Jul 10, 2021 •

edited

Loading

ac1214 Sep 23, 2021 •

edited

Loading

alvinlin123 commented Jan 14, 2022 •

edited

Loading

alvinlin123 commented Jan 14, 2022 •

edited

Loading