-
Notifications
You must be signed in to change notification settings - Fork 13
session: fix closing semantics #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the session closing mechanism by enforcing synchronous lock acquisition to prevent request/session race conditions and to meet the cass_session_free contract. It also adjusts test parameters and updates the runtime configuration to support proper concurrent behavior and ensure that the AsyncTests::Close suite runs reliably.
- Update session locking to use synchronous (block_on) guard acquisition in multiple API functions.
- Adjust test parameters (concurrent request count and sleep delays) for temporary diagnostic purposes.
- Update runtime configuration to a multi-threaded tokio runtime with two worker threads and enable AsyncTests suite in the Makefile.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/src/integration/tests/test_async.cpp | Reduced concurrent requests and added a sleep delay to simulate timing; ensure temporary changes are reverted or documented. |
scylla-rust-wrapper/src/session.rs | Updated synchronous lock acquisition logic and added detailed comments; consider refactoring repeated patterns to improve maintainability. |
scylla-rust-wrapper/src/lib.rs | Switched to a multi-threaded runtime with two worker threads. |
scylla-rust-wrapper/src/future.rs | Exposed into_raw as a public method. |
Makefile | Enabled the AsyncTests suite. |
Comments suppressed due to low confidence (5)
tests/src/integration/tests/test_async.cpp:19
- Reducing the number of concurrent requests to 5u may undermine the purpose of high-concurrency testing; ensure that this temporary change is reverted or clearly documented before merging.
#define NUMBER_OF_CONCURRENT_REQUESTS 5u
tests/src/integration/tests/test_async.cpp:57
- The insertion of a fixed sleep time could artificially delay test execution; consider parameterizing or removing this once temporary diagnostics are complete.
insert.set_sleep_time(100);
scylla-rust-wrapper/src/session.rs:85
- [nitpick] Consider clarifying the rationale for using a synchronous lock here (and throughout the file) by referencing the CPP Driver requirements, to improve maintainability and ease future updates.
// TODO: It's not clear whether this lock should be taken synchronously or asynchronously.
scylla-rust-wrapper/src/session.rs:241
- [nitpick] The repeated pattern and similar comments for synchronous lock acquisition across functions suggest that this logic could be abstracted into a helper function to reduce redundancy and simplify future changes.
keyspace_length: size_t,
scylla-rust-wrapper/src/session.rs:635
- [nitpick] Consider addressing the TODO by verifying whether the default setting for the query consistency is appropriate; updating this comment with a concrete action or follow-up reference would be beneficial.
// TODO: investigate if this is correct.
fead714
to
b477893
Compare
This increases our likelihood of experiencing interleaving that make the test AsyncTests.Close fail.
This is a preparatory step for fixing cass_session_free in the next commit.
Quoting `cassandra.h` documentation of `cass_session_free`: > Frees a session instance. If the session is still connected it will > be synchronously closed before being deallocated. This is definitely not the case in the Rust wrapper, which simply deallocates the session without waiting for the requests to complete. This commit makes `cass_session_free` wait for the session to close, satisfying the contract.
The current implementation of the driver is flawed. In order to empirically prove this, I tune the AsyncTests.Close test to fail with the current implementation. The tuning involves changing the number of concurrent requests to 5 and adding a sleep time of 100 milliseconds between each request. Moreover, the runtime is changed to use a multi-threaded runtime with 2 worker threads to allow for concurrent execution with interleaving that promotes the issue. This commit is intended to demonstrate the issue with the current implementation of the driver, specifically in how it handles session closure concurrent to running requests. The test should fail because some requests will only try to read-lock the session after the session has been closed, leading to failure of those requests. The following commits will address the issue by read-locking the session synchronously upon scheduling a request (via either of `cass_session_execute[_batch]`). This commit shall be removed before merging this PR into master.
AsyncTests.Close test specifies that the session only be closed after all running requests are finished. This is not necessarily the case in the current implementation, because the session lock is taken for reading by the request futures only in the async block. Races may thus occur if the session is closed while the request is being executed and it has not yet acquired the lock, leading in request failures. To be in line with the CPP Driver, we take the session lock synchronously, before returning from `cass_session_execute[_batch]` functions. This way we ensure that the session is not closed while there are any requests running, and that the session is closed only after all requests are finished. Care is taken to ensure that the current_thread runtime does not deadlock when waiting for the lock, by executing remaining futures while waiting for the lock to be released (by calling `RUNTIME.block_on`).
There is number of places in the code where we use `blocking_read()` to access the session data. This is suboptimal because it blocks the current thread, which can lead to deadlocks (on current_thread executor). To accomodate for this case, we switch to using `RUNTIME.block_on(lock.read())` to allow the async runtime to work on queued tasks while waiting for the lock to be released.
Since the commit c1e40d7, the Session `execute(_batch)` methods now clone the Session's Arc, which prevents UAF if the Session is closed while the requests are still running. That commit's message says: "we cannot enable `AsyncTests::Close` yet since it expects that prematurely dropped session awaits all async tasks before closing". This is now taken care of by the previous commit. The `RwLock` that the `CassSessionInner` is protected with is taken synchronously for reading by all running requests. This protects the Session from the premature drop, as the `RwLock` will not let the "writer" - `cass_session_close`'s future - to proceed until all requests are done. This means that `cass_session_close()`'s future resolves only when all requests are done. This also means that `cass_session_free()` blocks until all requests are done, which is exactly what `AsyncTests::Close` expects. Thus, we can enable the `AsyncTests::Close` test suite.
b477893
to
e3c2b0a
Compare
Rebased on master. |
Note: generated with GPT-4o and manually redacted.
Fix Session Closing Semantics and Enable
AsyncTests
SuiteSummary
This pull request introduces critical improvements to the session closing mechanism. Additionally, it enables the
AsyncTests::Close
test suite, aligns the behavior with the expectations of the CPP Driver, and satisfies the contract defined in thecassandra.h
documentation regardingcass_session_free()
andcass_session_close()
.Key Changes
Satisfy
cass_session_free
Contract:cass_session_free
to wait for the session to close before deallocating, as specified in thecassandra.h
documentation. This ensures that all requests are completed before the session is freed.Empirical Proof of Flaws in Current Implementation:
913a7c2b
) was introduced to empirically demonstrate flaws in the current implementation. TheAsyncTests::Close
test was tuned to fail by increasing concurrent requests, adding sleep times, and switching to a multi-threaded runtime with a hardcoded number of worker threads. This highlights issues with session closure concurrent to running requests.Prevent Logical Races:
cass_session_execute[_batch]
functions. This ensures that the session cannot be closed while requests are running, preventing logical races and request failures.current_thread
runtime by executing remaining futures while waiting for the lock to be released (RUNTIME.block_on(lock.read())
).Avoid Blocking Locks:
blocking_read()
withRUNTIME.block_on(lock.read())
to prevent thread-blocking issues. This change allows the async runtime to continue processing queued tasks while waiting for the lock to be released, reducing the risk of deadlocks, especially on the current_thread executor, but also possibly on multi_thread executor if locks are taken in callbacks.Enable
AsyncTests::Close
Suite:Session::execute(_batch)
methods has already been cloning the Session'sArc
, preventing use-after-free (UAF) scenarios when the session is closed while requests are still running [introduced in c1e40d7].RwLock
mechanism now ensures that the session is protected from premature drops by synchronously taking a read lock for all running requests. This guarantees thatcass_session_close()
andcass_session_free()
block until all requests are completed, aligning with the expectations of theAsyncTests::Close
suite.Files Changed
Makefile
: Enabled theAsyncTests::Close
suite.scylla-rust-wrapper/src/session.rs
: Comprehensive changes to improve session locking, prevent logical races, and satisfy thecass_session_free
contract.scylla-rust-wrapper/src/future.rs
:pub(crate)
'd one method to allow extractingclose_fut
functionality.scylla-rust-wrapper/src/lib.rs
andtests/src/integration/tests/test_async.cpp
: Temporary changes to demonstrate flaws in the current implementation.Notes to reviewers
AsyncTests::Close
test's semantics are satisfied.current_thread
runtime.913a7c2b
) is removed before merging.Fixes: #304
Pre-review checklist
Makefile
in{SCYLLA,CASSANDRA}_(NO_VALGRIND_)TEST_FILTER
.Fixes:
annotations to PR description.