Skip to content

Conversation

dlom
Copy link
Contributor

@dlom dlom commented Aug 19, 2025

xref: HIVE-2891

/assign @2uasimojo
/cc @huangmingxia

When deploying hive from a private image repository, in addition to supplying an imagePullSecret to the hive operator deployment, users will need to specify this same secret on the HiveConfig. With this change, private image users will be able to provision clusters from any namespace, not just Hive's namespace

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 19, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 19, 2025

@dlom: This pull request references HIVE-2891 which is a valid jira issue.

In response to this:

xref: HIVE-2891

/assign @2uasimojo
/cc @huangmingxia

When deploying hive from a private image repository, in addition to supplying an imagePullSecret to the hive operator deployment, users will need to specify this same secret on the HiveConfig.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from huangmingxia August 19, 2025 19:04
Copy link
Contributor

openshift-ci bot commented Aug 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dlom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 19, 2025

@dlom: This pull request references HIVE-2891 which is a valid jira issue.

In response to this:

xref: HIVE-2891

/assign @2uasimojo
/cc @huangmingxia

When deploying hive from a private image repository, in addition to supplying an imagePullSecret to the hive operator deployment, users will need to specify this same secret on the HiveConfig. With this change, private image users will be able to provision clusters from any namespace, not just Hive's namespace

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

codecov bot commented Aug 19, 2025

Codecov Report

❌ Patch coverage is 28.57143% with 95 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.03%. Comparing base (717017d) to head (3bf80ae).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
pkg/operator/hive/hive.go 0.00% 68 Missing ⚠️
pkg/install/generate.go 35.71% 9 Missing ⚠️
.../clusterdeployment/clusterdeployment_controller.go 83.33% 2 Missing and 2 partials ⚠️
pkg/controller/utils/utils.go 42.85% 4 Missing ⚠️
pkg/controller/utils/podconfig.go 0.00% 2 Missing ⚠️
pkg/operator/hive/dynamicclient.go 0.00% 2 Missing ⚠️
pkg/operator/hive/hiveadmission.go 0.00% 2 Missing ⚠️
pkg/operator/hive/operatorutils.go 0.00% 2 Missing ⚠️
pkg/operator/hive/sharded_controllers.go 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2734      +/-   ##
==========================================
- Coverage   50.16%   50.03%   -0.14%     
==========================================
  Files         288      288              
  Lines       34065    34240     +175     
==========================================
+ Hits        17090    17133      +43     
- Misses      15626    15756     +130     
- Partials     1349     1351       +2     
Files with missing lines Coverage Δ
pkg/constants/constants.go 100.00% <100.00%> (ø)
.../controller/clusterdeployment/clusterprovisions.go 62.35% <100.00%> (+0.07%) ⬆️
pkg/imageset/generate.go 97.64% <100.00%> (+0.05%) ⬆️
pkg/operator/hive/hive_controller.go 0.00% <ø> (ø)
...om/openshift/hive/apis/hive/v1/hiveconfig_types.go 0.00% <ø> (ø)
pkg/controller/utils/podconfig.go 0.00% <0.00%> (ø)
pkg/operator/hive/dynamicclient.go 0.00% <0.00%> (ø)
pkg/operator/hive/hiveadmission.go 0.00% <0.00%> (ø)
pkg/operator/hive/operatorutils.go 0.00% <0.00%> (ø)
pkg/operator/hive/sharded_controllers.go 0.00% <0.00%> (ø)
... and 4 more

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@2uasimojo 2uasimojo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR also needs to touch your previous efforts to reference this secret from the hive-operator pod. We discussed how copying imagePullSecrets across namespaces will end us up with a reference to a dockercfg Secret that doesn't (and shouldn't) exist in the destination namespace. So I think we need to refactor all of those places to simply add the value of hiveConfig.hiveImagePullSecretRef to the imagePullSecrets of controllers, admission, imageset, provision, and deprovision.

We also need to address copying the Secret to the targetNamespace iff that namespace is different from where hive-operator is running.

Once all of that is done, I think we can revert some of your additions that look up the current pod -- they should no longer be needed. (However, I like the way you rolled up sharedPodConfig; I would like to keep that for tolerations/nodeSelector.)

src := types.NamespacedName{Name: secretName, Namespace: srcNamespace}
dest := types.NamespacedName{Name: secretName, Namespace: destNamespace}

// TODO: cross-NS ownership reference? is that even possible?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. However, we should parent it to the CD. There is precedent, and a handy helper function. See for example how we do this for the pull secret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still haven't figured this one out. Did you link the wrong thing, or am I blind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh, sorry, correct reference.

As we discussed, "controller reference" is not the same as "owner reference", and I think we want the former. The purpose is for the Secret to automatically get deleted/garbage-collected when the parent (the CD) goes away.

@dlom dlom force-pushed the HIVE-2891 branch 2 times, most recently from 86bb5ee to 8744666 Compare August 21, 2025 04:11
@dlom
Copy link
Contributor Author

dlom commented Aug 21, 2025

we need to refactor all of those places to simply add the value of hiveConfig.hiveImagePullSecretRef to the imagePullSecrets of controllers, admission, imageset, provision, and deprovision

@2uasimojo after blindly forging ahead on this all day, I don't think this is possible either. The imagePullSecret by definition only refers to local secrets in the same NS. If the required secret is only in the hive NS, it really MUST be copied into the CD's namespace so that the imageset job can run in that NS

@2uasimojo
Copy link
Member

we need to refactor all of those places to simply add the value of hiveConfig.hiveImagePullSecretRef to the imagePullSecrets of controllers, admission, imageset, provision, and deprovision

@2uasimojo after blindly forging ahead on this all day, I don't think this is possible either. The imagePullSecret by definition only refers to local secrets in the same NS. If the required secret is only in the hive NS, it really MUST be copied into the CD's namespace so that the imageset job can run in that NS

Per our Meet™, clarified summary:

  • HIVE-2891: Make hive able to run as a private image #2725 copies imagePullSecrets (a list of names) from the current pod to the descendant pods. We don't want to do this, because it'll drag along the name of the generated dockercfg Secret. Rather, I think we ignore the current pod's imagePullSecrets and simply inject the value of your new hiveconfig field into the descendant pods' imagePullSecrets. This goes for both "stages": hive-operator => controllers/admission and controllers => imageset/prov/deprov.

  • The hive-operator and the controllers/admission aren't necessarily running in the same namespace. HiveConfig.Spec.TargetNamespace defaults to hive, but

    • You might have deployed hive-operator somewhere other than hive; and/or
    • If TargetNamespace is explicitly set, we'll deploy controllers/admission there instead.

    So if this is the case, we need to make sure we're copying the Secret from hive-operator's ns to the TargetNamespace.

  • Set controller reference on the copied Secrets as discussed above. (Uhh, I think for the controllers/admission copy, it's probably appropriate to parent to hive-controllers? I think the test would be: Edit HiveConfig.Spec.TargetNamespace, and make sure the Secret disappears from the old ns.)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 30, 2025
@dlom dlom force-pushed the HIVE-2891 branch 3 times, most recently from ce397ca to f484d4c Compare September 4, 2025 23:57
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 4, 2025
@dlom
Copy link
Contributor Author

dlom commented Sep 5, 2025

I've done an initial smoke test on this, with the operator, controllers, and CD all being in different non-default namespaces. The operator uses a label on the controller-level secret to clean up if the controllers move, and the CDs each get an individually named secret copy that is owned by the CD

@dlom
Copy link
Contributor Author

dlom commented Sep 5, 2025

/test e2e

@2uasimojo
Copy link
Member

/test e2e

Actual test succeeded; we flaked afterward.

@@ -41,6 +41,14 @@ type HiveConfigSpec struct {
// +optional
GlobalPullSecretRef *corev1.LocalObjectReference `json:"globalPullSecretRef,omitempty"`

// HiveImagePullSecretRef is used to specify a pull secret that can be used to pull Hive's own image.
// If hive has been deployed from a private registry, cluster installations will not succeed unless
// this reference is specified. This secret must live in Hive's TargetNamespace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true, or does it need to live in the namespace with hive-operator?

Comment on lines +574 to +582
// GetImagePullSecretName returns name for image pull secret name per cluster deployment
func GetImagePullSecretName(cd *hivev1.ClusterDeployment) string {
return apihelpers.GetResourceName(cd.Name, hiveImagePullSecretSuffix)
}

// GetImagePullSecretNameForDeprovision returns name for image pull secret name per cluster deprovision
func GetImagePullSecretNameForDeprovision(cd *hivev1.ClusterDeprovision) string {
return apihelpers.GetResourceName(cd.Name, hiveImagePullSecretSuffix)
}
Copy link
Member

@2uasimojo 2uasimojo Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can DRY these by making the param a meta.Object and using GetName() on it.

(Moot if you copy the Secret down just once with its original name, as noted.)

Comment on lines +47 to +48
// NOTE: This secret will be copied into the namespace of every ClusterDeployment, overwriting any secret
// with the same name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true either, as it looks like you're naming those Secrets after the cd/deprovision.

Though thinking about it, I think I like this idea better. There's nothing stopping multiple CDs being created in the same namespace, and we would rather have just one copy of this pull secret per namespace.

@@ -717,6 +719,7 @@ func completeAWSDeprovisionJob(req *hivev1.ClusterDeprovision, job *batchv1.Job)
job.Spec.Template.Spec.InitContainers = initContainers
job.Spec.Template.Spec.Containers = containers
job.Spec.Template.Spec.Volumes = volumes
job.Spec.Template.Spec.ImagePullSecrets = append(job.Spec.Template.Spec.ImagePullSecrets, corev1.LocalObjectReference{Name: constants.GetImagePullSecretNameForDeprovision(req)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. Why not do this in the calling func instead of here? I think all the other pieces are being done per-complete*DeprovisionJob() method because they can be different. (If you find otherwise, I would be receptive to a separate PR that pulls them out.) But this should always be the same, yah?
  2. This is working because we happen to give the ClusterDeprovision the same name as the ClusterDeployment. But it reads as if we're potentially referencing a different Secret -- and one we haven't copied in. I can see that it would be painful to get the CD down to these funcs, so I'm okay leaving the logic as is, but I would like to see a comment that calls out that it's relying on those two things having the same name. [Later] But moot if we're using the original Secret name as noted above.

Comment on lines +49 to +51
// hive namespace (named in HiveConfig.Spec.TargetNamespace). We use this to identify secrets in
// namespaces that were *previously* the configured TargetNamespace so we can clean
// them up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not just using owner references? I guess they would have to be on one of the Deployments/StatefulSets, but that should work, shouldn't it?

Comment on lines +378 to +380
if hiveConfigHasValidImagePullSecretReference(instance) {
hiveDeployment.Spec.Template.Spec.ImagePullSecrets = append(hiveDeployment.Spec.Template.Spec.ImagePullSecrets, *instance.Spec.HiveImagePullSecretRef)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, if you made the func return the ref instead of just checking for it, you could do something like:

Suggested change
if hiveConfigHasValidImagePullSecretReference(instance) {
hiveDeployment.Spec.Template.Spec.ImagePullSecrets = append(hiveDeployment.Spec.Template.Spec.ImagePullSecrets, *instance.Spec.HiveImagePullSecretRef)
}
if ref := getImagePullSecretReference(instance); ref != nil {
hiveDeployment.Spec.Template.Spec.ImagePullSecrets = append(hiveDeployment.Spec.Template.Spec.ImagePullSecrets, ref)
}

hiveContainer.Env = append(hiveContainer.Env, hiveImagePullSecretEnvVar)
}

func (r *ReconcileHiveConfig) dynamicCopyHiveImagePullSecret(hLog log.FieldLogger, instance *hivev1.HiveConfig) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah, let's clean this up.

For the getting, you should be able to use the existing controller-runtime-esque wrapper here. Example Get().

For the creating, you should be able to write a similar Create() wrapper in that file.

The end result should be that you can make this func feel a lot more like it would if r had/was a real c-r client.

@@ -88,3 +90,41 @@ func applyDeploymentConfig(hiveConfig *hivev1.HiveConfig, deploymentName hivev1.
container.Resources = *dc.Resources
}
}

func synchronizeUnstructuredSecrets(source, destination *unstructured.Unstructured) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and then all of this goes away.

Copy link
Contributor

openshift-ci bot commented Sep 6, 2025

@dlom: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants