Skip to content

Conversation

@DemetrisChr
Copy link
Contributor

@DemetrisChr DemetrisChr commented Feb 20, 2024

Motivation

A number of issues were observed with the existing range scan implementation, such as items missing & the scan getting stuck indefinitely when more than one IO threads are used in the SDK. This change simplifies the implementation of range scan and resolves issues with concurrency and IO threads.

Changes

  • Create a range_scan_load_balancer class to contain all of the logic around selecting which vbucket to scan.
  • Instead of using one concurrent channel for each vbucket scan when sending items to the caller (i.e. having 1024 channels), only use a single channel shared by all vbucket scans. This significantly simplifies the logic of next_item(). Items are sent through this channel, as well as signals that a vbucket scan has been completed (either successfully or with a fatal error), which allows the orchestrator to keep track of whether there are any outstanding vbucket scans.
  • Use asio::post when triggering KV operations within a KV callback (e.g. when doing a range_scan_continue from the callback of range_scan_create)
  • Add unit tests for the load balancing logic
  • Fix an issue with the "orchestrator prefix scan, get 10 items and cancel" test where it was fetching documents created in a different test.
  • For sampling scans, if a seed is provided it is now used for shuffling the vbuckets in the load balancer to ensure that a given seed always yields the same sample when concurrency 1 is used. We cannot have the same guarantee for concurrency > 1, as the order the vbuckets are scanned depends on factors such as how long each scan takes and how many scans are taking place on each node at any given time.

Results

All tests pass, including FIT

Copy link
Contributor

@Matt-Woz Matt-Woz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but maybe want another pair of eyes to confirm - also verified with PHP and it seems to have solved the previous issues w/concurrency

Copy link
Contributor

@thejcfactor thejcfactor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from my 1 comment, FIT looks good w/ the Python and Node.js performers. I can consistently get all tests to pass (wasn't the case previously). I will look more closely at the changes tomorrow if @avsej cannot get to them first. 👍

@DemetrisChr DemetrisChr force-pushed the CXXCBC-345-range-scan-fix branch 2 times, most recently from 4928429 to 01798b3 Compare February 26, 2024 18:03
Copy link
Member

@avsej avsej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, couple of style notes

@DemetrisChr DemetrisChr force-pushed the CXXCBC-345-range-scan-fix branch from 01798b3 to b52596b Compare February 28, 2024 17:50
@DemetrisChr DemetrisChr requested a review from avsej February 28, 2024 17:52
@avsej avsej merged commit 51bb84f into couchbase:main Feb 29, 2024
thejcfactor pushed a commit to thejcfactor/couchbase-cxx-client that referenced this pull request Mar 6, 2024
…chbase#525)

* CXXCBC-345: Range scan improvements & resolve concurrency issues

* For sampling scans use the same seed for shuffling vbuckets in the load balancer

---------

Co-authored-by: Sergey Avseyev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants