-
Notifications
You must be signed in to change notification settings - Fork 816
API endpoints for the series deletion API using block storage #4370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
API endpoints for the series deletion API using block storage #4370
Conversation
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Can you please also review this PR @alolita @Aneurysm9 |
I'm a bit confused why files disappeared like Did any functionality disappear? Could the blocks functionality live under |
@bboreham No functionality disappeared. The
I agree, it is a bit confusing that some of the blocks functionality is under the chunks folder. I can try moving it to the suggested folder. If the answer to my previous question was to keep the |
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
Signed-off-by: ilangofman <[email protected]>
…man/cortex into block_deletion_endpoints
Signed-off-by: ilangofman <[email protected]>
This would be a good thing to explain in the PR description, to aid reviewers. Also you could organise the change into commits where one is a pure refactor, just moving code, and then subsequent ones modify the behaviour. The current set of 22 commits shows the journey that you went on, but is not a good breakdown for reviewing. |
I expect when we remove chunks support (which is deprecated), then we can delete |
The way I have been thinking about it is that the tombstone is just a file to keep track of a series deletion request. Getting the delete request information would mean reading the tombstone for it. Admittedly, I have been using the two terms synonymously. I'll update the code to be more consistent. |
Signed-off-by: ilangofman <[email protected]>
@ilangofman Are you updating the code to be more consistent or should someone take over? |
Hi @jeromeinsf . I took a couple of weeks off but I am going to get back to this. I think this PR I should be able to finish on my own. For the rest of the deletion PR's, I am not sure how much time I will have available to work on them, with university starting this week. I would still like to contribute when I can but it might take some time to make changes. To speed things up, if someone wants to take over the other PR's or collaborate together, I'm open to all! |
Signed-off-by: ilangofman <[email protected]>
reverse change of combining blocks purger and tenant deletion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the first batch of review comments.
pkg/chunk/purger/blocks_purger.go
Outdated
hash.Write(bufStart) | ||
hash.Write(bufEnd) | ||
|
||
sort.Strings(selectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is hashing matchers the right way to create a "deletion ID"? For example I can have matcher like match[]=process_start_time_seconds{job="prometheus"}
and match[]=process_start_time_seconds{job= "prometheus"}
(there are some whitespaces in the second query). They are logically the same matcher, but will end up with different hash value.
Is this intended that create deletion request de-duplication is a best effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated it so it parses the matchers and then create the hash based on the parsed matchers. So any logically equivalent deletion requests should now result in the same hash. The example you provided should be the same hash. Also split up matchers like match[]=process_start_time_seconds&match[]={job= "prometheus"}
would result in the same hash as the example you provided (Since a deletion request is the "AND" of all matchers provided).
pkg/chunk/purger/blocks_purger.go
Outdated
// if the request was previously cancelled, we need to remove the cancelled tombstone before adding the pending one | ||
if err := tManager.RemoveCancelledStateIfExists(ctx, requestID); err != nil { | ||
level.Error(util_log.Logger).Log("msg", "removing cancelled tombstone state if it exists", "err", err) | ||
http.Error(w, err.Error(), http.StatusInternalServerError) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will need to be careful to just propagate error back to caller; we might inadvertently expose some detail that should not be exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to a message instead of propagating the error.
userID, err := tenant.TenantID(ctx) | ||
if err != nil { | ||
http.Error(w, err.Error(), http.StatusUnauthorized) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty common code, I think existing Cortex code uses middleware.AuthenticateUser
, any reason why we can't use the same function for delete series APIs?
pkg/chunk/purger/blocks_purger.go
Outdated
http.Error(w, fmt.Sprintf("Error marshalling response: %v", err), http.StatusInternalServerError) | ||
} | ||
|
||
w.WriteHeader(http.StatusOK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to https://pkg.go.dev/net/http#ResponseWriter since you called json.NewEncoder(w)
, which I assume calls ResponseWriter.Write()
, then status code of 200 is automatically written for you together with "guessed" content-type. So you don't really need to set the status at this line.
However, I think we should set the content type and the status header ourselves for clarity.
Also, I think the if block on line 155 is missing a return statement right? I would expect http.Error()
writes that the status header to 500 already, and this w.WriteHeader(http.StatusOk)
may have undefined behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, should be fixed now.
} | ||
|
||
params := r.URL.Query() | ||
requestID := params.Get("request_id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do input validation here? What if request_id
is not given?
return | ||
} | ||
|
||
if deleteRequest.State == cortex_tsdb.StateProcessed { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a "processing" state? Or we can rely on the deleteRequestCancelPeriod
to infer the "processing" state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can use the deleteRequestCancelPeriod
to infer the processing state. Whenever the tombstone with statePending is older than deleteRequestCancelPeriod
, then it is in processing state.
return err | ||
} | ||
|
||
// If the newer state found comes later in the order, then we replace the state in the map. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I think it might be better to name the variables newerStateOrder
and prevStateOrder
; this do imply that if we rename state
to newerState
the readability will improve too :)
pkg/storage/tsdb/tombstones.go
Outdated
return false | ||
} | ||
|
||
func (s BlockDeleteRequestState) GetStateOrder() (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a wise idea to export this method. This looks a lot like implementation detail that should not be exposed. I don't think we want anyone outside the tsdb
package to depend on the fact that the "order" of StatePending
is 0, "StateProcessed" is 2, etc. right? Because what is important is the relation between the "order" not the absolute value right?
If we really want to expose something, consider functions like IsGreaterThan(a, b BlockDeleteRequestState)
, but I don't think we should in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why it was exported in the first place. I don't think the logic is required anywhere else. I made it private now.
pkg/storage/tsdb/tombstones.go
Outdated
} | ||
|
||
if err = m.DeleteTombstoneFile(ctx, t.RequestID, t.State); err != nil { | ||
level.Error(m.logger).Log("msg", "Created file with updated state but unable to delete previous state. Will retry next time tombstones are loaded", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to log the old and new state.
pkg/storage/tsdb/tombstones.go
Outdated
|
||
err := m.WriteTombstone(ctx, newT) | ||
if err != nil { | ||
level.Error(m.logger).Log("msg", "error creating file tombstone file with the updated state", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to log the new state.
pkg/storage/tsdb/tombstones.go
Outdated
|
||
for _, state := range AllDeletionStates { | ||
filename := getTombstoneFileName(requestID, state) | ||
exists, err := m.TombstoneExists(ctx, requestID, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need this extra check, which causes extra network call (I believe the m.TombstoneExists
translates to a S3 HEAD http call).
Would it be possible to just do m.ReadTombstone
and differentiate if the returned err
is NotFound or other errors, then handle them differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! We can definitely do without it.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
Sorry everyone, there has been no activity from me on this. I have been really busy with school and unfortunately haven't had time to work on this. I do plan to come back to this once I get a bit more time available, but if someone else wants to help out, that would be great too. |
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
@ilangofman may start to work on this again soon :) so not staled. |
Signed-off-by: Ilan Gofman <[email protected]>
Signed-off-by: Ilan Gofman <[email protected]>
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
/remove stale |
Hello @ilangofman, thank you for opening this PR. There is a release in progress. As such, please rebase your CHANGELOG entry on top of the master branch and move the CHANGELOG entry to the top under Thanks, |
What this PR does:
This PR implements the API endpoints for the time series deletion API for block storage. Based on the proposal for the series deletion API here.
The work done in this PR includes creating, getting and canceling deletion requests.
-purger.delete-request-cancel-period
hasn't passed since the delete request was originally made.For the API to be enabled, set
--purger.enable
totrue
.The new endpoints are located within the purger, where the existing tenant deletion API is also located. However, the files are under a
chunks
subdirectory but contain existing APIs for blocks storage. A future PR will be required to move the purger service out of thechunks
subdirectory.To make it easier to review and get incremental feedback, the implementation of the proposal is going to be split among roughly 4 PR's.
Please let me know if you have any questions or concerns. Any feedback is appreciated. Thank you for taking a look at this PR and helping out!
Which issue(s) this PR fixes:
Initial PR of multiple to address #4267
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]