Skip to content

Conversation

jan--f
Copy link
Contributor

@jan--f jan--f commented Aug 26, 2025

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@jan--f
Copy link
Contributor Author

jan--f commented Aug 26, 2025

The final bit of #2648

@jan--f jan--f requested a review from machine424 August 26, 2025 13:55
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 26, 2025
Copy link
Contributor

@machine424 machine424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, some suggestions


var warning *InvalidConfigWarning
wCmc := defaultClusterMonitoringConfiguration()
wErr := UnmarshalStrict(content, &wCmc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to propagate warnings for this.
we already capture the error below in err := UnmarshalStrict(content, &cmc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see, I suppose I didn't understand #2592 fully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #2592, we had 2 different unmarshallers, we propagated the failures of the newer/stricter one as warnings only to block upgrades so the configs are fixed.

Copy link
Contributor

@machine424 machine424 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the newer/stricter one is the default/only one and we can (should) propagate its failures as errors.


var warning *InvalidConfigWarning
wU := &UserWorkloadConfiguration{}
wErr := UnmarshalStrict([]byte(content), &wU)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, having err := UnmarshalStrict([]byte(content), &u) below is enough, no need to have it as a warning as well.

@@ -265,7 +296,12 @@ thanosRuler:

for _, tc := range tcs {
t.Run(tc.name, func(t *testing.T) {
c, err := NewUserConfigFromString(tc.configString())
c, warning, err := NewUserConfigFromString(tc.configString())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that warnings in all tests that don't have a promadapter field set should nil

f.AssertOperatorConditionReason(configv1.OperatorDegraded, "InvalidConfiguration")
f.AssertOperatorConditionReason(configv1.OperatorAvailable, "InvalidConfiguration")
// Check that the previous setup hasn't been reverted
f.AssertStatefulsetExists("prometheus-user-workload", f.UserWorkloadMonitoringNs)(t)

t.Log("invalid configuration with field of the wrong case")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need have those as we don't want to propagate the same unmarshalling errors as warnings, see above.

// Restore the first configuration.
f.MustCreateOrUpdateConfigMap(t, getUserWorkloadEnabledConfigMap(t, f))
t.Log("asserting that CMO goes back healthy after the configuration is fixed")
f.AssertOperatorCondition(configv1.OperatorDegraded, configv1.ConditionFalse)(t)
f.AssertOperatorCondition(configv1.OperatorAvailable, configv1.ConditionTrue)(t)
// Once the config is adjusted, the operator becomes Upgradeable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could keep this and replace it with a config with promAdapter set; check it goes upgreadable=false when set and true when removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you added those checks in TestClusterMonitoringDeprecatedConfig

f.AssertOperatorCondition(configv1.OperatorDegraded, configv1.ConditionTrue)(t)
f.AssertOperatorCondition(configv1.OperatorAvailable, configv1.ConditionFalse)(t)
f.AssertOperatorConditionReason(configv1.OperatorDegraded, "UserWorkloadInvalidConfiguration")
f.AssertOperatorConditionReason(configv1.OperatorAvailable, "UserWorkloadInvalidConfiguration")
// Even when an invalid configuration is caught by both unmarshallers, the operator is still set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed, we just have one unmarshaller now and it only yields errors, no need to have warnings for it. see above.

@jan--f jan--f force-pushed the cleanup-deprecate-pa-config branch 2 times, most recently from 2ddd33a to 07f5388 Compare August 28, 2025 07:56
if c.ClusterMonitoringConfiguration.PrometheusK8sConfig.CollectionProfile != FullCollectionProfile && !c.CollectionProfilesFeatureGateEnabled {
return fmt.Errorf("%w: collectionProfiles is currently a TechPreview feature behind the \"MetricsCollectionProfiles\" feature-gate, to be able to use a profile different from the default (\"full\") please enable it first", ErrConfigValidation)
return nil, fmt.Errorf("%w: collectionProfiles is currently a TechPreview feature behind the \"MetricsCollectionProfiles\" feature-gate, to be able to use a profile different from the default (\"full\") please enable it first", ErrConfigValidation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to the PR: I think collectionProfile is GA in 4.19 and I think this logic can be cleaned now
cc @rexagod (I can create a ticket if needed)

@@ -791,7 +798,7 @@ func TestCollectionProfilePreCheck(t *testing.T) {
t.Run(tc.name, func(t *testing.T) {
c, err := NewConfigFromString(tc.config, true)
require.NoError(t, err)
err = c.Precheck()
_, err = c.Precheck()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can set the warning and ensure it's nil, an easy extra check is always welcome.

@@ -71,6 +71,9 @@ func TestClusterMonitoringOperatorConfiguration(t *testing.T) {
t.Log("asserting that CMO goes degraded after an invalid configuration is pushed")
f.AssertOperatorCondition(configv1.OperatorDegraded, configv1.ConditionTrue)(t)
f.AssertOperatorCondition(configv1.OperatorAvailable, configv1.ConditionFalse)(t)
// Even when an invalid configuration is caught by both unmarshallers, the operator is still set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover? also I don't think we do/need to make Upgradeable=false, no for such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I see what you mean...thanks for your patience 😓

t.Log("restoring the initial configurations")
uwmCM.Data["config.yaml"] = ``
f.MustCreateOrUpdateConfigMap(t, uwmCM)
f.MustCreateOrUpdateConfigMap(t, getUserWorkloadEnabledConfigMap(t, f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this f.MustCreateOrUpdateConfigMap(t, getUserWorkloadEnabledConfigMap(t, f)) line?

f.MustCreateOrUpdateConfigMap(t, getUserWorkloadEnabledConfigMap(t, f))
f.AssertOperatorCondition(configv1.OperatorDegraded, configv1.ConditionFalse)(t)
f.AssertOperatorCondition(configv1.OperatorAvailable, configv1.ConditionTrue)(t)
// Once the config is adjusted, the operator becomes Upgradeable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the operator should always stay upgradeable for cannot be deserialized

@jan--f jan--f force-pushed the cleanup-deprecate-pa-config branch from 07f5388 to ce0b4a1 Compare August 28, 2025 13:32
@jan--f
Copy link
Contributor Author

jan--f commented Aug 29, 2025

/retest

@machine424
Copy link
Contributor

/lgtm
thanks!

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 29, 2025
machine424 and others added 3 commits August 29, 2025 14:14
… monitoring and

set CMO to Upgradeable=false if errors are found

The strict unmarshaller is currently in advisory mode.
Setting CMO to Upgradeable=false ensures that configurations meet the new unmarshaller
checks in 4.18 before it is set to be the default in 4.19.

(cherry picked from commit 8e7bb77)
@jan--f jan--f force-pushed the cleanup-deprecate-pa-config branch from ce0b4a1 to 4370de4 Compare August 29, 2025 12:25
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 29, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2025
@jan--f
Copy link
Contributor Author

jan--f commented Aug 29, 2025

rebased. @machine424 requires your lgtm again.

@machine424
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 29, 2025
Copy link
Contributor

openshift-ci bot commented Aug 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, machine424

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jan--f
Copy link
Contributor Author

jan--f commented Sep 1, 2025

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Sep 1, 2025
@jan--f
Copy link
Contributor Author

jan--f commented Sep 1, 2025

/retitle MON-4343: Cleanup deprecate pa config

@openshift-ci openshift-ci bot changed the title Cleanup deprecate pa config MON-4343: Cleanup deprecate pa config Sep 1, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 1, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 1, 2025

@jan--f: This pull request references MON-4343 which is a valid jira issue.

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Sep 1, 2025

@jan--f: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 4370de4 link false /test okd-scos-e2e-aws-ovn
ci/prow/versions 4370de4 link false /test versions

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD a5a091b and 2 for PR HEAD 4370de4 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 92f7f83 into openshift:main Sep 1, 2025
19 of 21 checks passed
@dgoodwin
Copy link
Contributor

dgoodwin commented Sep 2, 2025

Apologies but this took out payloads due to ROSA use of the deprecated config: https://issues.redhat.com/browse/OCPBUGS-61135

Per org policy I need to pursue a revert.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants