Skip to content

Conversation

JoelSpeed
Copy link
Contributor

CC @eggfoobar

This updates the way we apply manifests to make sure that where we have combinations (A+B), that these are applied after the constituent gates.

This means that we can control the order, and, where there are multiple gates affecting a single validation (enum), we can choose how to apply for either of the gates being enabled, or both gates being enabled.

This also drops the DualReplica gate to DevPreview, which is why we needed to find a solution for this

Copy link
Contributor

openshift-ci bot commented Apr 15, 2025

Hello @JoelSpeed! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 15, 2025
@openshift-ci openshift-ci bot requested review from deads2k and jkyros April 15, 2025 16:36
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2025
@eggfoobar
Copy link
Contributor

eggfoobar commented Apr 15, 2025

/retitle OCPEDGE-1637: Enable separation of conflicting enum values, and drop DualReplica to DevPreview

@openshift-ci openshift-ci bot changed the title Enable separation of conflicting enum values, and drop DualReplica to DevPreview OCPEDGE-1637: Enable separation of conflicting enum values, and drop DualReplica to DevPreview Apr 15, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 15, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 15, 2025

@JoelSpeed: This pull request references OCPEDGE-1637 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

CC @eggfoobar

This updates the way we apply manifests to make sure that where we have combinations (A+B), that these are applied after the constituent gates.

This means that we can control the order, and, where there are multiple gates affecting a single validation (enum), we can choose how to apply for either of the gates being enabled, or both gates being enabled.

This also drops the DualReplica gate to DevPreview, which is why we needed to find a solution for this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@eggfoobar
Copy link
Contributor

/retest-required

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions to try and build a context window of why these changes are necessary

// +openshift:validation:FeatureGateAwareEnum:featureGate=HighlyAvailableArbiter;DualReplica,enum=HighlyAvailable;HighlyAvailableArbiter;SingleReplica;DualReplica;External
// +openshift:validation:FeatureGateAwareEnum:featureGate=HighlyAvailableArbiter,enum=HighlyAvailable;HighlyAvailableArbiter;SingleReplica;External
// +openshift:validation:FeatureGateAwareEnum:featureGate=DualReplica,enum=HighlyAvailable;SingleReplica;DualReplica;External
// +openshift:validation:FeatureGateAwareEnum:requiredFeatureGate=HighlyAvailableArbiter;DualReplica,enum=HighlyAvailable;HighlyAvailableArbiter;SingleReplica;DualReplica;External
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to introduce a dependence of HighlyAvailableArbiter AND DualReplica here?

Please correct me if I am wrong, but it seems like the goal here is to be able to have the values enforced by the two feature gates be in different feature set stages. It seems like a logical OR here is what we want and the manifests should be merged appropriately according to the feature gates enabled in each feature set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I gather this is more of a workaround due to a limitation of the upstream merge logic. The intention here is to describe in the marker api we want (HighlyAvailableArbiter OR DualReplica OR (HighlyAvailableArbiter AND DualReplica)).

That info is used to inform how we merge in the file, the reason this came about is because we can not merge in enum values, those simply get overwritten by the previous value. So this approach allows us to say (DevPreview = HighlyAvailableArbiter AND DualReplica) but (TechPreview = HighlyAvailableArbiter). Otherwise if we do an OR approach with something like featureGate=HighlyAvailableArbiter;DualReplica then only HighlyAvailableArbiter gets applied for all of the featuresets due to the merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the explanation. I did some additional digging and it looks like we essentially mimic a server side apply for all the "partial" manifests. Because there is no specific SSA tags for the enum field I assume it defaults to atomic meaning the entire list is replaced on each merge.

If this workaround gets us where we want to go, this seems fine to me temporarily. Longer term, I think I'd like to see if we can figure out a way to do this "better". I find the Foo+Bar.yaml file and the intermediate Foo+Bar feature gate to be a bit confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have intermediate Foo+Bar files for other cases, for example, where an XValidation needs to apply only when both gates are enabled.

This change allows us to be more specific in the cases where gates are being enabled together, allowing us to say, "If only this gate, do X, if this gate and another gate, do Y"

It potentially could be more automated, but we would have to work out a better way to merge the gates in, which would be awkward given how much of the current logic is based on the SSA based merge

Copy link
Contributor

@eggfoobar eggfoobar Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we're on the same page, I'll lgtm this PR for now but I'll keep the jira ticket referenced in this TODO in the backlog so we can come back to it or remove the TODO so it's not a done link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelSpeed yeah, that makes sense. If we can't reasonably "do better" based on our current approach then it's fine. Just calling out something I've noticed that seems like a bit of a rougher edge.

Comment on lines +637 to +647
// Sort the manifests such that any combination of feature gates is applied after the gates that it combines.
// This means that if we have a file that is "foo+bar" and a file that is "foo", the "foo+bar" file will be applied last.
// This enables more speicfic handling for combinations of feature gates that affect the same field.
sort.Slice(partialManifestFiles, func(i, j int) bool {
// Get the name of the files without the ".yaml" suffix.
// This should be the name of the feature gate, or, a list of feature gates separated by `+`.
iBase := strings.TrimSuffix(filepath.Base(partialManifestFiles[i].Name()), ".yaml")
jBase := strings.TrimSuffix(filepath.Base(partialManifestFiles[j].Name()), ".yaml")

return strings.Contains(jBase, "+") && strings.Contains(jBase, iBase)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that sorting in this way is effective?

If we are doing a proper manifest merge, wouldn't applying foo+bar followed by foo be a no-op when applying foo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason why this is ordered and the partial manifests that are a union of two feature gates are applied last is because the manifest merge can not merge enum values, so if two files are operating on the same list of enums the last one overrides the previous one. So we order and ensure the union file is applied last since that's the desired enum. Some more context on this starts in this conversation #2196 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Egli has it here

This is basically allowing us to override the merge of the individual gates by saying that something with multiple gates should take priority, when both gates are enabled

@eggfoobar
Copy link
Contributor

Hey @everettraven I tried to answer some of your questions, hope that provides some info, but I'm also relatively new to the codegen logic of this repo, @JoelSpeed would be the best one to provide better insight.

@eggfoobar
Copy link
Contributor

/retest-required

@eggfoobar
Copy link
Contributor

eggfoobar commented Apr 17, 2025

/retitle OCPEDGE-1775: Enable separation of conflicting enum values, and drop DualReplica to DevPreview

Updating title to new bug so that we keep the original bug in the backlog since it was referenced in the TODO for the feature. This will be returned to to make sure we re-evaluate this approach or come to some consensus on a new solution.

@openshift-ci openshift-ci bot changed the title OCPEDGE-1637: Enable separation of conflicting enum values, and drop DualReplica to DevPreview OCPEDGE-1775: Enable separation of conflicting enum values, and drop DualReplica to DevPreview Apr 17, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 17, 2025

@JoelSpeed: This pull request references OCPEDGE-1775 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

CC @eggfoobar

This updates the way we apply manifests to make sure that where we have combinations (A+B), that these are applied after the constituent gates.

This means that we can control the order, and, where there are multiple gates affecting a single validation (enum), we can choose how to apply for either of the gates being enabled, or both gates being enabled.

This also drops the DualReplica gate to DevPreview, which is why we needed to find a solution for this

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@eggfoobar
Copy link
Contributor

/test verify images

@everettraven
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2025
Copy link
Contributor

openshift-ci bot commented Apr 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everettraven, JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@everettraven
Copy link
Contributor

everettraven commented Apr 17, 2025

@eggfoobar Heads up that there is an active incident causing CI to be down. AFAIK, running tests and retests right now are likely to fail.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b7680e1 and 2 for PR HEAD 6ecd9ec in total

@eggfoobar
Copy link
Contributor

@everettraven Would you have the ability to skip the verify-crd-schema, more than likely we might have to since the complaint is that we are removing DualReplica from TechPreview, but that's a desired change in this case

@everettraven
Copy link
Contributor

@eggfoobar I'll give it a shot

/override verify-crd-schema

Copy link
Contributor

openshift-ci bot commented Apr 17, 2025

@everettraven: everettraven unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers.

In response to this:

@eggfoobar I'll give it a shot

/override verify-crd-schema

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b7680e1 and 2 for PR HEAD 6ecd9ec in total

1 similar comment
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD b7680e1 and 2 for PR HEAD 6ecd9ec in total

@JoelSpeed
Copy link
Contributor Author

/override ci/prow/verify-crd-schema

Dropping a feature from TPNU, to be expected that it complains of enum removal.

The enum will come back and hasn't shipped outside of nightlies and ECs yet

Copy link
Contributor

openshift-ci bot commented Apr 18, 2025

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/verify-crd-schema

In response to this:

/override ci/prow/verify-crd-schema

Dropping a feature from TPNU, to be expected that it complains of enum removal.

The enum will come back and hasn't shipped outside of nightlies and ECs yet

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f636181 and 1 for PR HEAD 6ecd9ec in total

@JoelSpeed
Copy link
Contributor Author

/retest

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD f636181 and 2 for PR HEAD 6ecd9ec in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 8b803ad and 1 for PR HEAD 6ecd9ec in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 9aa03e6 and 0 for PR HEAD 6ecd9ec in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 9aa03e6 and 2 for PR HEAD 6ecd9ec in total

@JoelSpeed
Copy link
Contributor Author

/retest-required

1 similar comment
@JoelSpeed
Copy link
Contributor Author

/retest-required

@sdodson sdodson merged commit 6bababe into openshift:master Apr 23, 2025
21 of 24 checks passed
@sdodson
Copy link
Member

sdodson commented Apr 23, 2025

Merging to avoid retest loop.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-cluster-config-api
This PR has been included in build ose-cluster-config-api-container-v4.19.0-202504232036.p0.g6bababe.assembly.stream.el9.
All builds following this will include this PR.

Copy link
Contributor

openshift-ci bot commented Apr 23, 2025

@JoelSpeed: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure 6ecd9ec link false /test e2e-azure
ci/prow/okd-scos-e2e-aws-ovn 6ecd9ec link false /test okd-scos-e2e-aws-ovn
ci/prow/verify-crd-schema 6ecd9ec link unknown /test verify-crd-schema
ci/prow/e2e-aws-serial 6ecd9ec link unknown /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

rvanderp3 added a commit to rvanderp3/installer that referenced this pull request May 29, 2025
rvanderp3 added a commit to rvanderp3/installer that referenced this pull request May 29, 2025
rvanderp3 added a commit to rvanderp3/installer that referenced this pull request May 29, 2025
rvanderp3 added a commit to rvanderp3/installer that referenced this pull request May 29, 2025
rvanderp3 added a commit to rvanderp3/installer that referenced this pull request May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants