[CORE-15281] Cloud Storage Clients: Implement batch delete for GCS#29246
Conversation
…ndler Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
ec84329 to
76680d3
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements batch delete support for Google Cloud Storage (GCS) in Redpanda's cloud storage clients. GCS does not support S3-style batch delete operations, requiring a different implementation using GCS's native batch API with multipart/mixed format.
Changes:
- Added GCS batch delete API implementation using multipart/mixed format
- Modified multipart response parser to handle empty response bodies
- Updated tests to validate batch delete behavior for success, partial errors, and edge cases
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/http/tests/utils.h | Added content_type_overrides parameter to flexible_function_handler |
| src/v/http/tests/utils.cc | Implemented content_type_overrides handling in response processing |
| src/v/cloud_storage_clients/util.cc | Fixed multipart parser to accept responses with multiple trailing CRLFs |
| src/v/cloud_storage_clients/tests/util_test.cc | Added test for multipart parser with empty body case |
| src/v/cloud_storage_clients/tests/gcs_client_test.cc | Added comprehensive test suite for GCS batch delete operations |
| src/v/cloud_storage_clients/tests/BUILD | Added build target for gcs_client_test |
| src/v/cloud_storage_clients/s3_client.h | Added method declarations for GCS batch delete |
| src/v/cloud_storage_clients/s3_client.cc | Implemented GCS batch delete request creation and response parsing |
| src/v/cloud_storage_clients/BUILD | Added rapidjson dependency |
| src/v/cloud_storage/tests/remote_test.cc | Updated tests to validate GCS batch delete behavior |
| src/v/cloud_io/tests/s3_imposter.h | Added content_type_overrides parameter and batch delete parsing function |
| src/v/cloud_io/tests/s3_imposter.cc | Implemented batch delete request handling in test imposter |
| src/v/cloud_io/remote.cc | Set GCS batch delete limit to 100 keys per request |
76680d3 to
cb8941e
Compare
CI test resultstest results on build#78994test results on build#79009
test results on build#79075
test results on build#79354
|
|
CI Failure: ducktape-release failed to generate final report, but build looks clean |
|
/cdt |
cb8941e to
9b47476
Compare
|
|
||
| return R"xml(<DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"></DeleteResult>)xml"; | ||
| } else if ( | ||
| request._method == "POST" && request._url.contains("batch")) { |
There was a problem hiding this comment.
request._url.contains("batch")
how is POST + url.contains('batch') precise enough to identify a batch delete request?
There was a problem hiding this comment.
just happens to be the case. better to use the whole "/batch/storage/v1" though
| "Content-Type: application/http\r\n" | ||
| "Content-ID: response-{}\r\n\r\n" | ||
| "HTTP/1.1 {}\r\n" | ||
| "X-GUploader-UploadID: test-upload-id-{}\r\n\r\n", |
There was a problem hiding this comment.
no, but this endpoint is. incidentally, I meant to remove all of these guploader headers. they wouldn't realistically appear in a batch delete response
9b47476 to
e4b52fb
Compare
Very crude request parsing, but it works well enough for existing tests. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
e4b52fb to
55e475e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/v/cloud_storage_clients/tests/gcs_client_test.cc:1
- The format string has 4 placeholders but 5 arguments are provided (boundary, i, code, i). The extra 'i' at the end (line 432) will be ignored by fmt::format, which may indicate a copy-paste error or incorrect format string.
/*
|
/cdt |
|
/cdt |
|
@Lazin @dotnwat @rockwotj @nvartolomei - Bump for reviews. TIA! |
Thanks! Will get to this today |
| if (cfg.is_gcs) { | ||
| return ss::make_shared<gcs_client>( |
This commit relaxes the line break accounting in multipart_response_parser::get to accept arbitrarily many CRLFs preceding the next boundary. A subrequest with a non-empty body looks like this: - header CRLF - CRLF - body CRLF - CRLF - boundary Previously, the multipart response parser expected to find _exactly_ two CRLF sequences preceding the next boundary. However, if the sub-request body is empty, then we get the following: - header CRLF - CRLF - CRLF - boundary Since the boundary is preceded by three CRLF sequences, the parser would treat the subrequest as malformed and STOP, discarding the rest of the outer multipart response body. Note that a subresponse body could be anything at all, including some string ending in one or more CRLFs. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Infer cloud storage backend at config creation time, then use that to choose between s3_client & gcs_client in the client_pool. At this stage gcs_client is just a thin extension of s3_client w/ no added functionality. A subsequent commit will add an override for GCS-native multipart batch deletes.
This is only needed for GCS, so various identifiers will reflect that. Includes json response body parsing to get GCS-native error reasons. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
- Wire the batch request limit (100) into remote.cc - Update cloud_io/remote_test Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
Bit simpler than retrofitting s3_client_test. Just flexing the batch delete code. Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
55e475e to
b24f8d2
Compare
|
@Lazin @nvartolomei - I'm going to merge this on the basis that it's quite similar to the ABS version (modulo |
This PR adds support for multipart plural deletes on GCS.
Proof: https://buildkite.com/redpanda/redpanda/builds/78997#019bba33-c39b-473f-9cd0-01a8424fd158
Backports Required
Release Notes
Improvements