Skip to content

Use TSDB's WAL for writes. #1103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Jan 21, 2020
Merged

Use TSDB's WAL for writes. #1103

merged 79 commits into from
Jan 21, 2020

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Oct 31, 2018

@tomwilkie tomwilkie force-pushed the wal branch 5 times, most recently from 047b710 to 4390f26 Compare November 6, 2018 15:48
- Add github.com/promtheus/tsdb/wal
- Update github.com/prometheus/client_golang for WrapRegistererWith function.

Signed-off-by: Tom Wilkie <[email protected]>
Signed-off-by: Tom Wilkie <[email protected]>
@vtolstov
Copy link

@tomwilkie do you plan to fix pr?

@codesome
Copy link
Contributor

Currently, the idea is to only write the WAL. Reading the WAL on startup without any downtime of ingesters is a little tricky, as reading WAL takes a lot of time - hence need to have ingesters dedicated only to read WAL before it takes any writes. So that would be a follow-up work after this.

As for using the written WAL, we can read the WAL and directly flush the chunks to the chunk store in case ingester crashes. A tool/script for to do it would again be a follow up of this (or should I add in the same PR?)

We (@gouthamve, @tomwilkie and I) also discussed regarding prometheus's tsdb vs only WAL, and concluded that having tsdb would be very tricky right now as it requires a lot of changes and need to think of a way to handle the churn. Hence planned to go ahead with WAL for now.

@codesome codesome force-pushed the wal branch 3 times, most recently from 673b0bc to 76dcda0 Compare December 20, 2019 12:19
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits! Thanks for all the work and patience Ganesh!

return nil, err
}
elapsed := time.Since(start)
level.Info(util.Logger).Log("msg", "recovery from WAL completed", "time", elapsed.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also make this a metric? So that we can compare the duration changes over releases and also correlate it with the number of series, etc.

@@ -285,6 +334,7 @@ func (i *Ingester) StopIncomingRequests() {

// Push implements client.IngesterServer
func (i *Ingester) Push(ctx old_ctx.Context, req *client.WriteRequest) (*client.WriteResponse, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra whitespace

Signed-off-by: Ganesh Vernekar <[email protected]>
Copy link
Contributor

@jtlisi jtlisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a decent lookover, LGTM

@@ -86,6 +87,11 @@ func newIngesterMetrics(r prometheus.Registerer) *ingesterMetrics {
// A small number of chunks per series - 10*(8^(7-1)) = 2.6m.
Buckets: prometheus.ExponentialBuckets(10, 8, 7),
}),
walReplayDuration: prometheus.NewSummary(prometheus.SummaryOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just a gauge as it doesn't change at all. No need of a summary.

@codesome
Copy link
Contributor

codesome commented Jan 8, 2020

@gouthamve in the last commit

  1. Fixed the wrong increment of cortex_ingester_memory_chunks metric in setChunks function.
  2. WAL replay duration is a gauge now.
  3. Added metric for checkpoint duration.
  4. Small cleanups.

@@ -81,6 +87,10 @@ func newIngesterMetrics(r prometheus.Registerer) *ingesterMetrics {
// A small number of chunks per series - 10*(8^(7-1)) = 2.6m.
Buckets: prometheus.ExponentialBuckets(10, 8, 7),
}),
walReplayDuration: prometheus.NewGauge(prometheus.GaugeOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this metric is not registered. You should register it below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks!

@@ -248,15 +313,24 @@ func (i *Ingester) Push(ctx old_ctx.Context, req *client.WriteRequest) (*client.
return nil, wrapWithUser(err, userID)
}
}
client.ReuseSlice(req.Timeseries)
defer client.ReuseSlice(req.Timeseries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointing #2000 here as it also includes the same fix, and there is discussion going on for the fix. I have found WAL to cause panics without that fix, so maybe we want to wait for some conclusion on that PR (merging this PR without this fix would make it unsafe to deploy WAL).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been merged and I have rebased to remove the change from this PR.

* `--ingester.wal-dir` to the directory where the WAL data should be stores and/or recovered from.
* `--ingester.wal-enabled` to `true` which enables writing to WAL during ingestion.
* `--ingester.checkpoint-enabled` to `true` to enable checkpointing of in-memory chunks to disk. This is optional which helps in speeding up the replay process.
* `--ingester.checkpoint-duration` to the interval at which checkpoints should be created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a recommendation here (what is the default?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommendation depends on the need and the ingester size. The default is 30m, will add that here, but we have 15m in our dev.

* `--ingester.recover-from-wal` to `true` to recover data from an existing WAL. The data is recovered even if WAL is disabled and this is set to `true`. The WAL dir needs to be set for this.
* If you are going to enable WAL, it is advisable to always set this to `true`.

## Stuff that is changed automatically when WAL is enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in lifecycle when WAL is enabled


## Stuff that is changed automatically when WAL is enabled

1. Flushing of data to chunk store during rollouts or scale down is disabled. This is because during a rollout of statefulset there is no 1 ingester leaving and joining each at the same time, rather the same ingester is shut down and broght back again with updated config. Hence flushing is skipped and the data is recovered from the WAL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no ingesters that are simultaneously leaving and joining.


To use WAL, there are some changes that needs to be made in the deployment.

## Things to change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the deployment


Let's take an example of 4 ingesters. The migration would look something like this:

1. Bring up a 1 stateful ingester `ingester-0` and wait till it's ready (accepting read and write requests).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one statueful


### Scale up

Scaling up is same as what you would do without WAL or statefulsets. Add 1 ingester at a time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one ingester. Why do we need to add one at a time? This limitation doesn't exist and I sometimes added several at the same time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we should scale up 1 at a time (due to the ring logic?), if that is not the case then I will modify the description.

There are 2 ways to do it, with the latter being a fallback option.

**First option**
Consider you have 4 ingesters `ingester-0 ingester-1 ingester-2 ingester-3` and you want to scale down to 2 ingesters, the ingesters which will be shutdown according to statefulset rules are `ingester-2 ingester-3`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ingester 2 ingester-3 --> ingester 3 and then ingester 2

Signed-off-by: Ganesh Vernekar <[email protected]>
@gouthamve gouthamve merged commit 01fae92 into cortexproject:master Jan 21, 2020
@gouthamve gouthamve deleted the wal branch January 21, 2020 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write ahead log ingester
9 participants