Skip to content

Follow-up: Apply Resource Limits for Tekton Components in Production (#6435) #6860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ab-ghosh
Copy link
Contributor

This PR mirrors the changes introduced in #6435 by applying CPU and memory limits for Tekton components in the production environment.

@openshift-ci openshift-ci bot requested review from aThorp96 and enarha June 25, 2025 13:14
@aThorp96
Copy link
Contributor

@ab-ghosh The PR title says this updates production, but only development and staging are updated in this PR

hugares and others added 27 commits June 25, 2025 23:46
The VM needed to be recreated, former one was not booting anymore. Also
delete commented out configurations.

Signed-off-by: Hugo Ares <[email protected]>
* update components/integration/development/kustomization.yaml

* update components/integration/staging/base/kustomization.yaml

---------

Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
The propagate-cost-management-labels ClusterPolicy will propagate cost-center and insights_cost_management_optimizations labels

  - Ensure `cost-center` and `insights_cost_management_optimizations` labels are set on tenant namespaces
  - Propagate these labels to all pods within matching namespaces
  - Set default `cost-center=760` if missing on the namespace
  - Always set `insights_cost_management_optimizations=true` on all pods

- Added ClusterRole `kyverno-background-controller-pod-update` to grant Kyverno required permissions for pod and namespace mutation

- Added Chainsaw test cases to validate behavior:
  - Label propagation for tenant namespaces with valid and missing `cost-center`
  - No label propagation for non-tenant namespaces
  - Label patching of existing namespaces missing `cost-center` label

Signed-off-by: rrajashe <[email protected]>
Co-authored-by: rrajashe <[email protected]>
…6454)

Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
* update components/integration/development/kustomization.yaml

* update components/integration/staging/base/kustomization.yaml

---------

Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
The servers reaches its limit which was very low.
Increase it to 256Mi.

Signed-off-by: Gal Ben Haim <[email protected]>
Kyverno is requiring more resources on rh01.

Signed-off-by: Francesco Ilario <[email protected]>
promotes integration-service in order to grant greater permissions for
integrationtestscenario/status resources for integration admins

Signed-off-by: Ryan Cole <[email protected]>
Now that production has been fixed, revert changes to
integrationtestscenario webhooks to no longer ignore errors

Signed-off-by: Ryan Cole <[email protected]>
* chore(deps): update konflux references
* Always compact ETCD before evaluating reclaimable space
* chore(deps): update registry.redhat.io/ubi9 docker tag to v9.6-1747219013
* Konflux build pipeline service account migration for defrag

KFLUXINFRA-1644

Signed-off-by: Hugo Ares <[email protected]>
* feat(PVO11Y-4784): Include new etcd-shield related metrics

Signed-off-by: Gabriel Soares <[email protected]>

* fix(PVO11Y-4784): Include fixes and exclude duplicated metric

Signed-off-by: Gabriel Soares <[email protected]>

---------

Signed-off-by: Gabriel Soares <[email protected]>
Co-authored-by: rh-tap-build-team[bot] <127938674+rh-tap-build-team[bot]@users.noreply.github.com>
This should fix an error reported by Manish.
We want to temporarily prevent new PipelineRuns to be created in rh01.
This PR just copies the base folder to rh01 instead of patching the rule
for the sake of time

Signed-off-by: Francesco Ilario <[email protected]>
This change disables Kyverno's Reports controller to reduce load on rh01

Signed-off-by: Francesco Ilario <[email protected]>
* initial commit

* move to a ref to github repo

* argocd reference

* add initial owners and readme

* Update components/pulp-access-controller/README.md

Co-authored-by: Ralph Bean <[email protected]>

* delete pulp-access-operator from prod

* remove temporary Denis to please the bot

* typo fix

* fix mapping

* fixes to my kustomization

* empty commit to trigger test

---------

Co-authored-by: Yasen Trahnov <[email protected]>
Co-authored-by: Ralph Bean <[email protected]>
The scale-to-zero job is not adding value in a ArgoCD managed environment

Signed-off-by: Francesco Ilario <[email protected]>
rh-hemartin and others added 26 commits June 25, 2025 23:46
This enables etcd-shield's metrics in staging clusters.  etcd-shield
will now report the following metrics:

- metrics from go's runtime
- metrics from controller-runtime
- etcd_shield_query_enabled: What does etcd-shield think the current
  admission state is?

Signed-off-by: Andy Sadler <[email protected]>
This commit allows the ibmstorage GitHub repository to be added to the
production repository allow list.
The smee sidecar implements a health check loop that sends an event to
the same server that the client listens on and verifies the client
forwards it.

This addresses an issue in which the client freezes and stops forwarding
events, while still looking alive.

Signed-off-by: Yftach Herzog <[email protected]>
* migrate cost-management policies to policies component

we want to have all the policies in a dedicated component

Signed-off-by: Francesco Ilario <[email protected]>

* fix format and add missing fields

Signed-off-by: Francesco Ilario <[email protected]>

* disable validate policy in development until KubeSaw is removed

Signed-off-by: Francesco Ilario <[email protected]>

---------

Signed-off-by: Francesco Ilario <[email protected]>
Add sidecar-based full-roundtrip liveness probe to the Smee server (In STAGE).

Signed-off-by: Barak Korren <[email protected]>
Two containers had the same name

Signed-off-by: Yftach Herzog <[email protected]>
…6844)

Updated the memory request in the Smee server deployment from an incorrect value to the correct format, ensuring proper resource allocation.

Signed-off-by: Barak Korren <[email protected]>
If the sidecar or one of the other containers are crashlooping, we
might get to a state in which a container restarts but the liveness
probe will always fail as the other container is in crashloopbackoff.

Making the threshold higher than the max backoff time (should be 5
minutes), will ensure that the probe will have a chance to succeed.

Signed-off-by: Yftach Herzog <[email protected]>
This should prevent a condition in which smee and its sidecar crashloop
forever.

Signed-off-by: Yftach Herzog <[email protected]>
This should prevent a condition in which smee and its sidecar crashloop
forever.

Signed-off-by: Yftach Herzog <[email protected]>
The image needs to be updated for server and client at the same time.

Signed-off-by: Yftach Herzog <[email protected]>
On the busiest production cluster, we see pipelineruns and more rarely
taskruns are not reconciled timely resulting objects removed from the
cluster before they are actually stored in the DB. With the updated
values we wait longer before giving up and force remove the objects.
@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

openshift-ci bot commented Jun 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ab-ghosh
Once this PR has been reviewed and has the lgtm label, please assign lcarva for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ab-ghosh ab-ghosh closed this Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.