-
Notifications
You must be signed in to change notification settings - Fork 13
DC-aware LBP - better support the deprecated parameters #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DC-aware LBP - better support the deprecated parameters #321
Conversation
edaf1f6
to
df9bf55
Compare
let permit_dc_failover = if used_hosts_per_remote_dc > 0 { | ||
// TODO: update cassandra.h documentation to reflect this behaviour. | ||
tracing::warn!( | ||
"cass_*_set_load_balance_dc_aware(_n): `used_hosts_per_remote_dc` parameter is only partially \ | ||
supported in the driver: `0` is supported correctly, and any value `>0` has the semantics of \"+inf\", \ | ||
which means no limit on the number of hosts per remote DC. This is different from the original cpp-driver! \ | ||
To clarify, you can understand this parameter as \"permit_dc_failover\", with `0` being `false` and `>0` \ | ||
being `true`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it is difficult to support connection-opening part of this parameter. We could expose cpp_rust_unstable method to allow that.
What is the problem with exposing LBP part of that? You can wrap LBP, and make an iterator that counts returned nodes per dc, and filters out nodes of DCs that reached the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also handle the failover-for-local-cl parameter in wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it is difficult to support connection-opening part of this parameter. We could expose cpp_rust_unstable method to allow that.
It's not the matter of the Rust driver's API; it's how its connection pooling works inside.
It's even more about the very cpp-driver's mechanism being flawed. See my findings in #315:
After the initial metadata fetch, connection pools are created only for the non-ignored known nodes.
- NOTE: How I understand the code, connection pool can later be created only for newly added nodes. This means that if all the nodes from a remote DC that we have opened connections to get DOWN, then the driver will not open connections to another node from that DC!
With such flawed semantics AND a large and nontrivial effort that would be needed to do this in the Rust Driver's inner parts, I'm strongly for not supporting <1, +inf)
interval in the same way as cpp-driver.
What is the problem with exposing LBP part of that? You can wrap LBP, and make an iterator that counts returned nodes per dc, and filters out nodes of DCs that reached the limit.
Yes, we could do that. But what would be the benefit of this? I see none. The important part is not to open unnecessary connections. With connections already opened, I don't see any benefit from omitting a strict subset of remote targets from the query plan.
If you see any benefit, please share it. But not seeing it, I'm not going to implement this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With such flawed semantics
Oh I'm not suggesting to follow such weird behaviors. The important part is limiting the connections to N per remote DC - if we have exact semantics more that are more reasonable than cpp-driver, even better.
Ofc there is no need to do this in this PR.
df9bf55
to
17a6685
Compare
17a6685
to
0736b58
Compare
cass_{cluster,exec_profile}_set_load_balance_dc_aware(_n) now warn on null or empty local DC name provided.
`set_load_balance_dc_aware_n` unnecessarily `unwrap()`ped the local DC name instead of pattern matching. This commit fixes that.
cass_{cluster,exec_profile}_set_load_balance_dc_aware(_n) now warn if the deprecated parameters: - `used_hosts_per_remote_dc`, or - `allow_remote_dcs_for_local_cl` are used (set to non-zero value). This commit does not change the behavior of the driver, it just informs the user that these parameters are no longer supported and should be set to 0. NOTE: next few commits implement (partial) support for these parameters and thus change the messages introduces here.
The parameter `used_hosts_per_remote_dc` of the DC-aware LBP has been long deprecated. However, it's been still supported in the cpp-driver. Due to Rust driver API's, it'd be very hard to support it correctly in the case of `>0`, because of reasons explained in <scylladb#315>. Therefore, the partial support approach is taken: - case `=0` -> employ `HostFilter`, disable DC failover and we're done. Our semantics here are in line with the CPP Driver. - case `>0` -> Treat this as `+inf` case - don't limit connections nor requests to remote nodes. Emit a warning that the semantics are different than in CPP Driver. This commit only configures the LBP to set `permit_dc_failover` to false when `used_hosts_per_remote_dc` is `0`, and to true when it's greater than `0`. The next commit will implement the connection management based on this parameter: If a DC is remote to **all** LBPs **and** those LBPs disallow DC failover, then the driver will not connect to any hosts in that DC. This will finish proper support for the `used_hosts_per_remote_dc=0` case.
The parameter `used_hosts_per_remote_dc` of the DC-aware LBP has been long deprecated. However, it's been still supported in the cpp-driver. Due to Rust driver API's, it'd be very hard to support it correctly in the case of `>0`, because of reasons explained in <scylladb#315>. Therefore, the partial support approach is taken: - case `=0` -> employ `HostFilter`, disable DC failover and we're done. Our semantics here are in line with the CPP Driver. - case `>0` -> Treat this as `+inf` case - don't limit connections nor requests to remote nodes. Emit a warning that the semantics are different than in CPP Driver. This commit introduces specific connection management for the case when `used_hosts_per_remote_dc=0`: if a DC is remote to **all** LBPs **and** those LBPs disallow DC failover, then the driver will not connect to any hosts in that DC. Implementation is done using the existing `Filtering{Config,Info}` machinery. LBPs declare whether they restrict DCs to the specific local DC, or allow all DCs - `DcRestriction` enum serves this purpose. Then, `CassHostFilter` is built from the composition of DC restrictions, by building up an `AllowedDcs` enum: if any LBP allows all DCs, then `AllowedDcs::All` is used, otherwise, we build a whitelist of allowed DCs is constructed (`AllowedDcs::Whitelist(Vec<String>)`). Finally, if `AllowedDcs` is `Whitelist`, then `CassHostFilter` filters out hosts whose DCs are not in that whitelist. This finishes proper support for the `used_hosts_per_remote_dc=0` case.
0736b58
to
fc33900
Compare
The purpose of `DcLocalConsistencyAllowance` is to determine whether remote DCs are allowed to be contacted when a local consistency level is used. It's going to be used in the next commit, to support `allow_remote_dcs_for_local_cl` parameter of the DC-aware LBP.
Even though the parameter has long been deprecated, we should support it. This commit makes both functions: - `cass_cluster_set_load_balance_dc_aware`, - `cass_execution_profile_set_load_balance_dc_aware` handle the parameter `allow_remote_dcs_for_local_cl` correctly.
Those logs are emitted at trace level, so they will not clutter the output. They are useful for debugging.
This reverts commit 6b44c2c. The adjustments are no longer needed, because the offending parameters are now supported.
fc33900
to
b30520a
Compare
used_hosts_per_remote_dc
parameterThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the DC-aware load balancing policy by updating the semantics of the deprecated parameters so that nonzero used_hosts_per_remote_dc values are now treated as permitting DC failover and by fully supporting the allow_remote_dcs_for_local_cl option. Key changes include:
- Updating integration tests to use the new behavior (switching from 0 to 1 in parameters and adjusting assertions).
- Modifying the filtering and load balancing policy logic to incorporate DcRestriction and DcLocalConsistencyAllowance.
- Adjusting API functions in the cluster and exec_profile modules to reflect the updated parameter semantics.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/src/integration/tests/test_exec_profile.cpp | Updated test profiles with revised LBP parameters and assertions. |
tests/src/integration/tests/test_dc_aware_policy.cpp | Added tests for allow_remote_dcs_for_local_cl and adjusted validation of execution profiles. |
scylla-rust-wrapper/src/load_balancing.rs | Introduced new enums and logic for handling remote DC allowance in LBP, with additional logging. |
scylla-rust-wrapper/src/exec_profile.rs | Modified exec profile configuration to align with new DC-aware policy behavior. |
scylla-rust-wrapper/src/cluster.rs | Updated cluster configuration and related tests to support the revised parameter semantics. |
Makefile | Updated test filters to reflect changes in DC-aware policy tests. |
Comments suppressed due to low confidence (1)
tests/src/integration/tests/test_dc_aware_policy.cpp:88
- [nitpick] If the commented-out assertions represent expected behavior under certain conditions, consider clarifying the intent or adding a note explaining the rationale. If they are no longer necessary, consider removing them to improve code clarity.
// are down and removed them from the host list.
"Filtering policy got {:?} in fallback and decided to {}.", | ||
node, | ||
match (is_host_allowed, is_dc_allowed) { | ||
(false, false) => "DROP it because neither host nor DC are not allowed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The log message here contains a double negative that may be confusing. Consider rephrasing it to something like "DROP it because neither host nor DC is allowed" to improve clarity.
(false, false) => "DROP it because neither host nor DC are not allowed", | |
(false, false) => "DROP it because neither host nor DC is allowed", |
Copilot uses AI. Check for mistakes.
The
It looks like ccm encounters an error trying to start a node that was killed. Interestingly, the problem does not appear with the newer Scyllas. @Lorak-mmk any ideas? My observations: the test that fails is one that I've newly added. EDIT: we decided that we can remove that old Scylla version from the CI. |
The Scylla 5.4.8 was introduced to the CI 9 months ago, in acf9d3d. The rationale was to test on the same version as the cpp-driver CI. There's no use in testing neither cpp-driver nor this driver with such old version anymore. As some tests fail due to ccm errors on that version, we remove it from the CI.
The test DCExecutionProfileTest.DCAware was disabled because it expected deterministic behaviour of the token-unaware round robin load balancing. As the Rust Driver round-robins nodes with a random shift, the test would fail. I reworked it to check that the node in the other DC is not used, instead of expecting a specific node in the local DC to be used. The test is now enabled in the Makefile.
DCAwarePolicy integration test was missing a test case for the `allow_remote_dcs_for_local_cl` parameter. This commit adds a test case that verifies the behavior of the DCAwarePolicy when this parameter is set to either logical value.
Once we: - implemented `get_attempted_hosts_from_future`, and - implemented correct support of the two deprecated options: - `used_hosts_per_remote_dc`, - `allow_remote_dcs_for_local_cl`, we can enable the `DcAwarePolicyTest` test suite.
b30520a
to
946ccca
Compare
Background
The parameters:
used_hosts_per_remote_dc
,allow_remote_dcs_for_local_cl
of the DC-aware LBP have been long deprecated. However, they've been still supported in the cpp-driver, so we should probably support it. At least, we should accept the same set of values as the cpp-driver, which has not been true: values other than0
would triggerCASS_ERROR_LIB_BAD_PARAMS
, and0
would result in a behaviour different than cpp-driver's: it would be ignored and not restrict remote DC hosts at all.What's done
More informative warnings/errors
The messages logged when an invalid / not supported / partially supported value is given as a parameter are made more informative.
Made
used_hosts_per_remote_dc
semantics closer to cpp-driver'sDue to Rust driver API's, it'd be very hard to support it correctly in the case of
>0
, because of reasons explained in #315. Therefore, the partial support approach is taken:=0
-> employCassHostFilter
, disable DC failover and we're done. Our semantics here are in line with the CPP Driver.>0
-> Treat this as+inf
case - don't limit connections nor requests to remote nodes. Emit a warning that the semantics are different than in CPP Driver.Allowed
used_hosts_per_remote_dc > 0
and described its limitationsAny nonzero value is treated as
+inf
, which means that the datacenter failover is permitted without restrictions. All remote nodes can have opened connections to and be targeted with requests.Fixed
used_hosts_per_remote_dc = 0
semantics and made it fully consistent with cpp-driver'sImplemented full support of
allow_remote_dcs_for_local_cl
This was as simple as using another filtering logic in the existing
FilteringLoadBalancingPolicy
.Testing
Unit tests
Unit tests were introduced for internal abstractions used to configure the
CassHostFilter
to filter out remote nodes, if those are disallowed by all LBPs. Existing unit tests were adjusted to expectused_hosts_per_remote_dc > 0
as supported.A unit test was written for the backbone of
allow_remote_dcs_for_local_cl
,DcLocalConsistencyAllowance
.Integration tests
DCExecutionProfileTest_DCAware
test.DcAwarePolicyTest_UsedHostsRemoteDc
test.allow_remote_dcs_for_local_cl
option:DcAwarePolicyTest_AllowRemoteDcsForLocalCl
.Unresolved questions
Behaviour of both our previous and current implementation of functions taking
used_hosts_per_remote_dc
parameter differ from one described incassandra.h
. Should we adjust the documentation there to be in line with the implementation?Fixes: #315
Pre-review checklist
- [ ] I have enabled appropriate tests inMakefile
in{SCYLLA,CASSANDRA}_(NO_VALGRIND_)TEST_FILTER
.Fixes:
annotations to PR description.