-
Notifications
You must be signed in to change notification settings - Fork 90
[Subnet Prioritization] Support capacity-optimized-prioritized and prioritized Allocation Strategy #671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…leet call Signed-off-by: Hanxuan Zhang <[email protected]>
Signed-off-by: Hanxuan Zhang <[email protected]>
…rams launching with EnableSingleAvailabilityZone and prioritized|capacity-optimized-prioritized AllocationStrategy Signed-off-by: Hanxuan Zhang <[email protected]>
Signed-off-by: Hanxuan Zhang <[email protected]>
…ity-optimized-prioritized when using EnableSingleAvailabilityZone Signed-off-by: Hanxuan Zhang <[email protected]>
src/slurm_plugin/fleet_manager.py
Outdated
# set SingleAvailabilityZone to False | ||
"SingleAvailabilityZone": ( | ||
self._compute_resource_config["Networking"]["SingleAvailabilityZone"] | ||
if self._compute_resource_config["Networking"]["SingleAvailabilityZone"] is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that you add a condition here where we check if the allocation strategy is the one we want and then apply SingleAvailibilityZone. We don't want to use this parameter if we use lowest-price AllocationStrategy
src/slurm_plugin/fleet_manager.py
Outdated
for instance_type in self._compute_resource_config["Instances"]: | ||
subnet_ids = self._compute_resource_config["Networking"]["SubnetIds"] | ||
for subnet_id in subnet_ids: | ||
overrides.update({"InstanceType": instance_type["InstanceType"], "SubnetId": subnet_id}) | ||
if ( | ||
self._compute_resource_config.get("AllocationStrategy") == "prioritized" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check if the CapacityType is the one that we want along with allocation strategy, the reason we should add that even if that's already taken care in CLI validator is because customer can suppress or ignore and move forward.
CHANGELOG.md
Outdated
- There were no changes for this version. | ||
- Support prioritized|capacity-optimized-prioritized Allocation Strategy and EnableSingleAvailabilityZone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change should be in 3.14.0
src/slurm_plugin/fleet_manager.py
Outdated
self._compute_resource_config["Networking"]["SingleAvailabilityZone"] | ||
if self._compute_resource_config["Networking"]["SingleAvailabilityZone"] is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure self._compute_resource_config
always has "Networking" section?
src/slurm_plugin/fleet_manager.py
Outdated
self._compute_resource_config["Networking"]["SingleAvailabilityZone"] | ||
if self._compute_resource_config["Networking"]["SingleAvailabilityZone"] is not None | ||
and ( | ||
self._compute_resource_config.get("AllocationStrategy") == "prioritized" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Can we check these conditions using a function as we keep checking this above an here
Signed-off-by: Hanxuan Zhang <[email protected]>
Signed-off-by: Hanxuan Zhang <[email protected]>
Signed-off-by: Hanxuan Zhang <[email protected]>
(5, "queue-single-az", "fleet1", False, {}, None), | ||
# Use "prioritized" Allocation Strategy AND Launch Override with Priority | ||
(5, "queue-prioritized", "fleet1", False, {}, None), | ||
# Use "capacity-optimized-prioritized" Allocation Strategy AND Launch Override with Priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one of them to. have all_or_nothing
True?
@@ -312,13 +319,26 @@ def _uses_single_az(self): | |||
subnet_ids = self._compute_resource_config.get("Networking", {}).get("SubnetIds", []) | |||
return len(subnet_ids) == 1 | |||
|
|||
def _uses_subnet_prioritization(self): | |||
return ( | |||
self._compute_resource_config.get("AllocationStrategy") == "prioritized" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a valid condition where we check prioritized
is used with On-demand
Capacity and capacity-optimized-prioritized
is checked with Spot Capacity type. Please update the unit test accordingly as they are passing without this change
tests/common.py
Outdated
"Api": "create-fleet", | ||
"Instances": [{"InstanceType": "t2.medium"}, {"InstanceType": "t2.large"}], | ||
"AllocationStrategy": "capacity-optimized-prioritized", | ||
"CapacityType": "on-demand", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: The unit test after the change https://github.com/aws/aws-parallelcluster-node/pull/671/files#r2164221669 should not pass
…zed' is used with On-Demand Capacity and 'capacity-optimized-prioritized' is used with Spot Capacity type. Signed-off-by: Hanxuan Zhang <[email protected]>
src/slurm_plugin/fleet_manager.py
Outdated
"SingleAvailabilityZone", None | ||
) | ||
if enable_single_availability_zone is None or ( | ||
enable_single_availability_zone and self._uses_subnet_prioritization() is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we suppress the CLI validator and CX uses EnableSingleVaialibilityZone
as False when using all-or-nothing
with 1 Az and multiple Instances then Ec2 create Fleet call will Fail as we do not set SingleInstanceType or SingleAvailibilityZone which is a requirement from EC2
Signed-off-by: Hanxuan Zhang <[email protected]>
Update the PR description to mention what is no longer relevant |
Description of changes
Tests
References
Checklist
develop
add the branch name as prefix in the PR title (e.g.[release-3.6]
).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.