Skip to content

NC | Concurrency & refactoring | Add delay, version move checks and GPFS refactoring #8419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 6, 2024

Conversation

romayalon
Copy link
Contributor

@romayalon romayalon commented Sep 29, 2024

Explain the changes

  1. NamespaceFS -
    1.1. Added to the retries a delay, the delay is the sum of a base of 70 + random(0,50). Delay is common when using retries mechanism, specifically in multithreading, if another request moved the files in a way that caused a failure to our current request, retrying right away might still have the issue if the second request didn't finish its work, therefore we wait a bit and let the other request finish its work. I saw that usually above 100 ms is good, and having an addition of random ms is nice to have.
    1.2. moved is_gpfs ? check to inside _open_files_gpfs() function
    1.3. _delete_single_object_versions() - added 2 calls to _check_version_moved() for checking if the version moved between the latest version location and .versions/ at the time of the deletion in order to make sure that we will delete the version even if it moved. the check will throw VERSION_MOVED error that will trigger a retry, in the next try we will locate the new location of the version and remove it.
    1.4. _check_version_moved() function receives key, version_id and cur_path, if cur_path is the latest version location, we will check if it was moved to .versions/, if cur_path is in .versions/ we will check if the latest version has the same version_id as the version_id param.

Issues: Fixed #xxx / Gap #xxx

  1. Fixed NSFS | Versioning | Concurrent delete latest & delete object by ID which is also the latest  #8414

Testing Instructions:

  1. sudo jest --testRegex=jest_tests/test_versioning_conc -t 'concurrent delete objects by version id/latest'
  • Doc added/updated
  • Tests added

@romayalon romayalon force-pushed the romy-gpfs-refactoring branch 3 times, most recently from 6cd3731 to 15972de Compare October 1, 2024 07:07
@romayalon romayalon marked this pull request as ready for review October 1, 2024 07:36
@romayalon romayalon force-pushed the romy-gpfs-refactoring branch from 15972de to fe34ee2 Compare October 1, 2024 07:57
@romayalon romayalon requested review from nadavMiz and shirady October 1, 2024 08:51
@shirady
Copy link
Contributor

shirady commented Oct 1, 2024

@romayalon Could you please add a short explanation about "Added to the retries a delay, the delay is the sum of a base of 70 + random(0,50)." in the PR description? (Why the delay solves the issue, Why we chose it, etc.).

@romayalon
Copy link
Contributor Author

@romayalon Could you please add a short explanation about "Added to the retries a delay, the delay is the sum of a base of 70 + random(0,50)." in the PR description? (Why the delay solves the issue, Why we chose it, etc.).

Delay is common when using retries mechanism, specifically in multithreading, if another request moved the files in a way that caused a failure to our current request, retrying right away might still have the issue if the second request didn't finish its work, therefore we wait a bit and let the other request finish its work. I saw that usually above 100 ms is good, and having an addition of random ms is nice to have.

@romayalon romayalon requested review from nadavMiz and shirady October 1, 2024 12:21
@romayalon romayalon force-pushed the romy-gpfs-refactoring branch from fe34ee2 to 46d8b9f Compare October 1, 2024 12:38
@shirady
Copy link
Contributor

shirady commented Oct 1, 2024

@romayalon Could you please add a short explanation about "Added to the retries a delay, the delay is the sum of a base of 70 + random(0,50)." in the PR description? (Why the delay solves the issue, Why we chose it, etc.).

Delay is common when using retries mechanism, specifically in multithreading, if another request moved the files in a way that caused a failure to our current request, retrying right away might still have the issue if the second request didn't finish its work, therefore we wait a bit and let the other request finish its work. I saw that usually above 100 ms is good, and having an addition of random ms is nice to have.

@romayalon Why the base is 70 and not 100?

Copy link
Contributor

@shirady shirady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romayalon romayalon force-pushed the romy-gpfs-refactoring branch from 2c23a0c to bf7d5be Compare October 6, 2024 15:13
@romayalon romayalon merged commit 405905b into noobaa:master Oct 6, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NSFS | Versioning | Concurrent delete latest & delete object by ID which is also the latest
3 participants