Skip to content

NETOBSERV-2225 - Deploy static plugin at operator startup #1345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jpinsonneau
Copy link
Contributor

@jpinsonneau jpinsonneau commented Apr 2, 2025

Description

Create the console plugin when FlowCollector doesn't exists to expose the new forms.

Suggested alternatives: #1346 & #1374

See netobserv/network-observability-console-plugin#763 for the forms implementations

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Comment on lines 81 to 100
// force reconcile at startup
go r.InitReconcile(ctx)

return nil
}

func (r *FlowCollectorReconciler) InitReconcile(ctx context.Context) error {
log := log.FromContext(ctx)
log.Info("Initializing resources...")

var err error
for attempt := range initReconcileAttempts {
// delay the reconcile calls to let some time to the cache to load
time.Sleep(5 * time.Second)
_, err = r.Reconcile(ctx, reconcile.Request{})
if err != nil {
log.Error(err, "Error while doing initial reconcile", "attempt", attempt)
} else {
break
}
}
return err
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ I wonder if there is an out of box mechanism to trigger the loop after the cache loaded. That's why I'm using a sleep here and this will may work in all situations.

https://redhat-internal.slack.com/archives/C02939DP5L5/p1743518264445429

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, when a reconcile loop is failing, you can also return a time value to reschedule the reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I gave a try with that without success.

I'm refactoring the code again to move the static content to another controller wich will be cleaner I guess. I will give another try with the reschedule time on the new controller 👍

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 2, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 14, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 14, 2025
@jpinsonneau jpinsonneau force-pushed the 1942 branch 2 times, most recently from c4c57e1 to dbc4cbf Compare April 15, 2025 10:40
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@netobserv netobserv deleted a comment from github-actions bot Apr 15, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 15, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Apr 17, 2025
Comment on lines 38 to 61
// force reconcile at startup
go r.InitReconcile(ctx)

return ctrl.NewControllerManagedBy(mgr).
For(&flowslatest.FlowCollector{}, reconcilers.IgnoreStatusChange).
Named("staticPlugin").
Complete(&r)
}

func (r *Reconciler) InitReconcile(ctx context.Context) {
log := log.FromContext(ctx)
log.Info("Initializing resources...")

for attempt := range initReconcileAttempts {
// delay the reconcile calls to let some time to the cache to load
time.Sleep(5 * time.Second)
_, err := r.Reconcile(ctx, ctrl.Request{})
if err != nil {
log.Error(err, "Error while doing initial reconcile", "attempt", attempt)
} else {
return
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OlivierCazade I modified the PR to be in a dedicated controller.

As you can see, I force the Reconcile call during the Start function so OLM can't interpret the result containing Requeue / RequeueAfter here.

Since we don't create a dedicated CR for static plugin, I don't think we can rely on these here.

WDYT ?

@netobserv netobserv deleted a comment from github-actions bot Apr 17, 2025
@netobserv netobserv deleted a comment from github-actions bot May 6, 2025
@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label May 6, 2025
Copy link

github-actions bot commented May 6, 2025

New images:

  • quay.io/netobserv/network-observability-operator:1c54f0e
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-1c54f0e
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-1c54f0e

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:1c54f0e make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-1c54f0e

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-1c54f0e
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@jpinsonneau
Copy link
Contributor Author

@memodi FYI I did some changes to address #1345 (comment) comment

Just tested and the behavior remains the same 😉
It avoid using an unecessary service account and role

@jpinsonneau jpinsonneau added the needs-review Tells that the PR needs a review label May 13, 2025
@memodi
Copy link
Member

memodi commented May 13, 2025

/jira NETOBSERV-2225

log := log.FromContext(ctx)
log.Info("Initializing resources...")

for attempt := range initReconcileAttempts {
Copy link
Member

@jotak jotak May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, what happens if all 5 attempts fail? The static plugin wouldn't deploy, but the rest would work normally? Or does it make the controller CLBO or so ? (my understanding is it's the first, but just want to make sure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the attempts fails, the static plugin will not be there at controller startup.
As soon as a reconcile loop is triggered, it will appears (ie creating a FlowCollector for example)


r.status.SetUnknown()
defer r.status.Commit(ctx, r.Client)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After setting "Unknown", I think this controller should return if openshift isn't detected, right?
We could check it in Kind

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's done in the static reconciler using HasConsolePlugin function:

func (r *CPReconciler) reconcileStatic(ctx context.Context, desired *flowslatest.FlowCollector) error {
l := log.FromContext(ctx).WithName("console-plugin")
ctx = log.IntoContext(ctx, l)
// Retrieve current owned objects
err := r.Managed.FetchAll(ctx)
if err != nil {
return err
}
if r.ClusterInfo.HasConsolePlugin() {
if err = r.checkAutoPatch(ctx, desired, constants.StaticPluginName); err != nil {
return err
}
}
if r.ClusterInfo.HasConsolePlugin() {
// Create object builder
builder := newBuilder(r.Instance, &desired.Spec, constants.StaticPluginName)
if err = r.reconcilePlugin(ctx, &builder, &desired.Spec, constants.StaticPluginName, "NetObserv static plugin"); err != nil {
return err
}
if err = r.reconcileDeployment(ctx, &builder, &desired.Spec, constants.StaticPluginName, ""); err != nil {
return err
}
if err = r.reconcileServices(ctx, &builder, constants.StaticPluginName); err != nil {
return err
}
} else {
// delete any existing owned object
r.Managed.TryDeleteAll(ctx)
}
return nil
}

If console plugin is not available, nothing is deployed and the status goes ready

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static controller could deploy something else than the console plugin in future so I think it's better to keeps things separated here. WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, yes sounds good, thanks!

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment when not running on openshift, other than that lgtm

@jpinsonneau
Copy link
Contributor Author

/restest

@jotak
Copy link
Member

jotak commented May 14, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 14, 2025
@jotak jotak removed the needs-review Tells that the PR needs a review label May 14, 2025
@memodi
Copy link
Member

memodi commented Jul 14, 2025

@jpinsonneau - could you rebase this PR please? I tried locally using make commands but running into other issues.

Copy link

openshift-ci bot commented Jul 15, 2025

New changes are detected. LGTM label has been removed.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 15, 2025
Copy link

openshift-ci bot commented Jul 15, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jotak. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau
Copy link
Contributor Author

Rebased without changes

@memodi
Copy link
Member

memodi commented Jul 15, 2025

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jul 15, 2025
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:cdd8f5a
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdd8f5a
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdd8f5a

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:cdd8f5a make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-cdd8f5a

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-cdd8f5a
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@memodi
Copy link
Member

memodi commented Jul 15, 2025

@jpinsonneau Here's quick feedback from my initial review of the form view:

- ebpf flow filtering
	- what does option 1 and 2 mean for port fields? [1]
- Deployment model
- Mode:
	- add a note which Loki mode is recommended for production use cases. 
- Prometheus Mode:
	- add a node what "Auto" mode mean? Or hide it if its not relevant 
- pipeline stage:
	- flow filtering.
	- some advanced configuration like "secondaryNetworks" could be exposed.

[1]

Screenshot 2025-07-15 at 4 50 19 PM

at the end it generated empty yaml [2] , I was trying with 4.20 OCP version

Screenshot 2025-07-15 at 4 58 34 PM

I'll add more feedback as I test more.

Copy link

openshift-ci bot commented Jul 15, 2025

@jpinsonneau: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator 43c5e6c link false /test e2e-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants