Skip to content

lifecycle improvement for none-versionning buckets#9160

Merged
jackyalbo merged 1 commit into
noobaa:masterfrom
jackyalbo:jacky-lifecycle-fix
Aug 13, 2025
Merged

lifecycle improvement for none-versionning buckets#9160
jackyalbo merged 1 commit into
noobaa:masterfrom
jackyalbo:jacky-lifecycle-fix

Conversation

@jackyalbo

@jackyalbo jackyalbo commented Jul 27, 2025

Copy link
Copy Markdown
Contributor

Describe the Problem

As most of the times when lifecycle is set, the bucket doesn't have versioning enabled, we would like to add a special case for handling deleting the only key in the bucket that fits the lifecycle rule.
Before the change, we would find 1K keys that fit and then go one by one and delete them from the DB. With this fix, it will be replaced with 1 DB query.

Issues: Fixed #xxx / Gap #xxx

  1. Fixing one issue we found here: https://issues.redhat.com/browse/DFBUGS-2955 before the customer decided to disable lifecycle.
  2. Gap - We need to better handle also for versioning - also there - going over the keys one by one doesn't really scale.

after the fix - this is the explain result:

EXPLAIN ANALYZE WITH rows AS (SELECT _id  FROM objectmds WHERE data->>'bucket' = '67ea718f8525bd002640c893' AND data->>'key' ~ '^application' AND data->>'create_time' < '2025-08-12T11:11:34.000Z' AND data->'deleted' IS NULL AND data->'upload_started' IS NULL AND data->'version_enabled' IS NULL LIMIT 1000) UPDATE objectmds SET data = jsonb_set(data, '{deleted}', to_jsonb('2025-08-13T11:11:34.537Z'::text), true) WHERE _id IN (SELECT rows._id FROM rows) RETURNING *;
                                                                                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on objectmds  (cost=9.16..17.20 rows=1 width=87) (actual time=53.436..833.670 rows=1000 loops=1)
   ->  Nested Loop  (cost=9.16..17.20 rows=1 width=87) (actual time=52.548..168.073 rows=1000 loops=1)
         ->  HashAggregate  (cost=8.60..8.61 rows=1 width=74) (actual time=52.248..52.648 rows=1000 loops=1)
               Group Key: rows._id
               Batches: 1  Memory Usage: 345kB
               ->  Subquery Scan on rows  (cost=0.55..8.60 rows=1 width=74) (actual time=5.835..51.606 rows=1000 loops=1)
                     ->  Limit  (cost=0.55..8.59 rows=1 width=25) (actual time=5.775..51.187 rows=1000 loops=1)
                           ->  Index Scan using idx_btree_objectmds_version_seq_index on objectmds objectmds_1  (cost=0.55..8.59 rows=1 width=25) (actual time=5.772..51.092 rows=1000 loops=1)
                                 Index Cond: (((data ->> 'bucket'::text) = '67ea718f8525bd002640c893'::text) AND ((data ->> 'key'::text) >= 'application'::text) AND ((data ->> 'key'::text) < 'applicatioo'::text))
                                 Filter: (((data -> 'deleted'::text) IS NULL) AND ((data -> 'upload_started'::text) IS NULL) AND ((data -> 'version_enabled'::text) IS NULL) AND ((data ->> 'key'::text) ~ '^application'::text) AND ((data ->> 'create_time'::text) < '2025-08-12T11:11:34.000Z'::text))
         ->  Index Scan using objectmds_pkey on objectmds  (cost=0.56..8.58 rows=1 width=667) (actual time=0.113..0.113 rows=1 loops=1000)
               Index Cond: (_id = rows._id)
 Planning Time: 0.872 ms
 Execution Time: 833.915 ms
(14 rows)

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Bulk-delete objects by flexible query (key patterns, creation time, size, tags) for PostgreSQL-backed buckets without versioning.
  • Bug Fixes

    • Improved per-object reply construction and error handling when deleting multiple objects by filter.
  • Tests

    • Added integration tests for bulk-delete-by-query and refined lifecycle/metadata-store test descriptions and formatting.

@coderabbitai

coderabbitai Bot commented Jul 27, 2025

Copy link
Copy Markdown

Walkthrough

Adds MDStore.delete_objects_by_query to mark objects deleted via SQL for PostgreSQL, changes MDStore.find_objects to return a Promise, updates object_server to use the new bulk-delete path for non-versioned Postgres buckets, and adds integration tests for the new method; minor test formatting fixes elsewhere.

Changes

Cohort / File(s) Change Summary
MDStore bulk deletion & return-type
src/server/object_services/md_store.js
Change find_objects to return Promise<nb.ObjectMD[]>; add async delete_objects_by_query(...) that builds a conditional SQL query (CTE) to mark matching objects as deleted and optionally return deleted rows.
Object server deletion flow
src/server/object_services/object_server.js
Update delete_multiple_objects_by_filter to use MDStore.instance().delete_objects_by_query(query) (with return_results = true) when bucket.versioning is DISABLED and DB is Postgres; retain previous per-object deletion fallback and adjust reply construction to handle optional per-object results.
MDStore integration tests
src/test/integration_tests/db/test_md_store.js
Add three PostgreSQL-only tests for delete_objects_by_query covering key regex, max_create_time, and limit; minor formatting tweaks in parts tests.
S3 lifecycle test formatting
src/test/integration_tests/api/s3/test_lifecycle.js
Cosmetic changes: fix typos in test names, reformat lifecycle rule arrays, and adjust whitespace; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ObjectServer
    participant MDStore
    participant PostgreSQL

    Client->>ObjectServer: Delete objects by filter request
    alt bucket.versioning == DISABLED && DB != mongodb
        ObjectServer->>MDStore: delete_objects_by_query(params with return_results=true)
        MDStore->>PostgreSQL: Execute CTE + UPDATE to mark matched objects deleted
        PostgreSQL-->>MDStore: Return updated rows (if requested)
        MDStore-->>ObjectServer: Return deleted object rows
    else
        ObjectServer->>MDStore: find_objects(params)
        MDStore-->>ObjectServer: Return object list
        ObjectServer->>ObjectServer: delete_multiple_objects(objects) (per-object)
    end
    ObjectServer-->>Client: Respond with deletion results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Possibly related PRs

Suggested labels

size/M

Suggested reviewers

  • tangledbytes

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2c2eb3 and c013518.

📒 Files selected for processing (4)
  • src/server/object_services/md_store.js (2 hunks)
  • src/server/object_services/object_server.js (3 hunks)
  • src/test/integration_tests/api/s3/test_lifecycle.js (17 hunks)
  • src/test/integration_tests/db/test_md_store.js (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/test/integration_tests/db/test_md_store.js
  • src/server/object_services/object_server.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/integration_tests/api/s3/test_lifecycle.js
🧬 Code Graph Analysis (2)
src/test/integration_tests/api/s3/test_lifecycle.js (2)
src/tools/md_blow_lifecycle.js (2)
  • mp_list_after (204-204)
  • obj_params (158-163)
src/tools/coding_speed.js (1)
  • num_parts (108-108)
src/server/object_services/md_store.js (4)
src/endpoint/s3/ops/s3_put_object.js (1)
  • tagging (22-22)
src/endpoint/s3/ops/s3_post_object_uploads.js (1)
  • tagging (14-14)
src/test/integration_tests/db/test_md_store.js (3)
  • max_create_time (71-71)
  • query (103-108)
  • bucket_id (20-20)
src/server/system_services/system_server.js (1)
  • moment (9-9)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
src/test/integration_tests/api/s3/test_lifecycle.js (2)

290-299: LGTM! Cosmetic improvements to test structure and naming.

The changes improve test readability by:

  • Simplifying lifecycle configurations to use single-element rule arrays
  • Correcting the typo from "lifecyle" to "lifecycle" in test descriptions
  • Minor formatting adjustments for consistency

These are purely cosmetic improvements that don't affect test logic or functionality.

Also applies to: 330-339, 494-502, 532-540, 571-582, 627-627, 639-639, 651-651, 659-660, 669-669, 677-678, 686-686, 695-695, 704-704, 713-713, 721-721, 728-729, 737-737, 743-744, 769-769, 797-797


405-406: Minor formatting improvement in assertion message.

The string interpolation formatting is more consistent and readable.

src/server/object_services/md_store.js (4)

634-634: Function signature updated to return Promise.

The return type annotation correctly reflects that find_objects is now asynchronous and returns a Promise.


723-745: Good parameter binding implementation for most conditions.

The parameterized query approach correctly prevents SQL injection for the conditional filters (size, tagging, create_time). The parameter array building and corresponding SQL placeholders are properly implemented.


699-699: TODO comment indicates incomplete versioning support.

The comment correctly identifies that this method doesn't support versioned buckets, which aligns with the PR's stated scope of handling non-versioned buckets only.


746-770: Efficient bulk delete implementation with conditional return.

The CTE-based approach for bulk deletion is well-designed:

  • Uses a Common Table Expression to select matching objects
  • Updates objects in a single operation
  • Conditionally returns results based on the return_results parameter
  • Proper limit handling
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@jackyalbo jackyalbo force-pushed the jacky-lifecycle-fix branch 5 times, most recently from e064263 to a2c2eb3 Compare July 28, 2025 14:38
@jackyalbo jackyalbo marked this pull request as ready for review July 29, 2025 07:36

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ab748b and a2c2eb3.

📒 Files selected for processing (4)
  • src/server/object_services/md_store.js (2 hunks)
  • src/server/object_services/object_server.js (3 hunks)
  • src/test/integration_tests/api/s3/test_lifecycle.js (17 hunks)
  • src/test/integration_tests/db/test_md_store.js (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit Configuration File

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/integration_tests/api/s3/test_lifecycle.js
  • src/test/integration_tests/db/test_md_store.js
🧬 Code Graph Analysis (2)
src/server/object_services/object_server.js (2)
config.js (2)
  • config (7-7)
  • _ (13-13)
src/server/object_services/md_store.js (2)
  • config (28-28)
  • _ (7-7)
src/test/integration_tests/db/test_md_store.js (2)
src/test/integration_tests/api/s3/test_lifecycle.js (2)
  • mocha (10-10)
  • config (20-20)
src/server/object_services/md_store.js (1)
  • config (28-28)
🔇 Additional comments (15)
src/test/integration_tests/api/s3/test_lifecycle.js (4)

290-299: LGTM! Array formatting improved for readability.

The reformatting of array literals from multi-line to single-line format improves code compactness and readability without affecting functionality.

Also applies to: 330-339, 494-502, 532-540, 571-582


627-627: Good catch on the typo corrections.

The corrections from "lifecyle" to "lifecycle" in test case names improve consistency and correctness.

Also applies to: 639-639, 651-651, 660-660, 669-669, 678-678, 686-686, 695-695, 704-704, 713-713, 721-721, 729-729, 737-737, 744-744


406-406: String formatting improved.

The formatting change for the assertion message improves readability while maintaining the same functional behavior.


1-900: Incorrect: delete_objects_by_query test coverage already exists

Tests for the new delete_objects_by_query method are present in src/test/integration_tests/db/test_md_store.js (see the mocha.it('delete_objects_by_query – …') cases). You can disregard the missing-tests suggestion.

Likely an incorrect or invalid review comment.

src/server/object_services/md_store.js (2)

634-634: Correct return type annotation.

Good fix! The find_objects method is async and should return Promise<nb.ObjectMD[]> rather than nb.ObjectMD[].


698-753: Well-implemented bulk deletion method with proper security measures.

The new delete_objects_by_query method effectively addresses the PR objective of improving lifecycle performance for non-versioned buckets. Key strengths:

  • Uses CTE pattern for efficient bulk operations
  • Properly parameterized query prevents SQL injection for the timestamp
  • Comprehensive filtering options (key regex, size, tagging, creation time)
  • Restricts to non-versioned objects (version_enabled IS NULL)
  • Optional result return for flexibility

The implementation correctly uses PostgreSQL-specific features like jsonb operators and regex matching.

However, there are a few considerations:

#!/bin/bash
# Description: Verify the method is only called for PostgreSQL databases
# Expected: Find conditional usage based on DB_TYPE

echo "Checking for PostgreSQL-specific usage of delete_objects_by_query..."
rg -A 5 -B 5 "delete_objects_by_query" src/server/
src/server/object_services/object_server.js (4)

975-985: LGTM!

Query object construction is well-structured with proper regex escaping for security. The parameter mapping is clear and consistent.


986-1005: Excellent performance optimization with proper fallback logic.

The conditional use of delete_objects_by_query for PostgreSQL non-versioned buckets is a well-designed optimization. The fallback to individual deletions ensures compatibility across all scenarios.

The TODOs appropriately acknowledge current limitations for versioned buckets and batch processing.


1015-1015: Good defensive programming practice.

The safety check prevents potential null/undefined access errors when delete_results is not initialized (which occurs when using the new bulk deletion path).


986-987: LGTM!

Variable declarations are appropriate with proper initialization in both conditional branches.

src/test/integration_tests/db/test_md_store.js (5)

53-72: Good test data setup for shared usage across test cases.

The shared test objects and timestamp calculations are well-structured and will support consistent testing across the new test cases.


73-90: Comprehensive test for key-based filtering.

The test correctly validates regex key filtering functionality with proper database type checking and result verification.


92-116: Excellent test coverage for time-based filtering.

The test effectively validates max_create_time filtering with different scenarios, ensuring the functionality works correctly across various time thresholds.


118-136: Good test coverage for limit functionality.

The test correctly validates that the limit parameter restricts the number of objects deleted as expected.


228-314: Good formatting improvements.

The consistent indentation and spacing in the parts array improves code readability and maintainability.

Comment thread src/server/object_services/md_store.js Outdated
Comment thread src/server/object_services/md_store.js Outdated

@dannyzaken dannyzaken left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe attach to the PR the EXPLAIN output for the queries for future reference

Signed-off-by: jackyalbo <jacky.albo@gmail.com>
@jackyalbo jackyalbo force-pushed the jacky-lifecycle-fix branch from a2c2eb3 to c013518 Compare August 13, 2025 12:58
let delete_results;
let objects;
// TODO: Add support to delete_objects_by_query also for versioning or add another function to support versioning.
if (req.bucket.versioning === 'DISABLED' && config.DB_TYPE !== 'mongodb') /* only for postgres */ {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need config.DB_TYPE !== 'mongodb' ?
Is it possible to have DB_TYPE as mongodb?

Maybe it is used for the tests.

If it is, we should probably remove it.

@jackyalbo jackyalbo Aug 13, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait until we stop supporting MongoDB; then we will delete it, and it will be easy to find that we need to update this place. It is only for testing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants