Skip to content

fix(cncf-install): use Falco chart 8.0.1, disable falcoctl downloader#2306

Merged
kubestellar-hive[bot] merged 1 commit into
kubestellar:masterfrom
bmvinay7:fix/install-falco-chart-version-and-egress
May 21, 2026
Merged

fix(cncf-install): use Falco chart 8.0.1, disable falcoctl downloader#2306
kubestellar-hive[bot] merged 1 commit into
kubestellar:masterfrom
bmvinay7:fix/install-falco-chart-version-and-egress

Conversation

@bmvinay7
Copy link
Copy Markdown
Contributor

Description

Rewrites fixes/cncf-install/install-falco.json so it actually installs Falco end to end, including on clusters with restricted egress (kind, k3d, air-gapped, corporate networks blocking github.io). The previous mission was unrunnable on any current chart and crash-looped pods on the most common kind setup.

Bugs in the previous mission

1. helm install ... --version 0.43.0 fails with chart not found

$ helm install falco falcosecurity/falco --namespace falco --create-namespace --version 0.43.0
Error: INSTALLATION FAILED: chart "falco" matching 0.43.0 not found in falcosecurity index ...

0.43.0 is the Falco app version. The Helm --version flag takes the chart version. Chart 8.0.1 maps to app v0.43.0:

$ helm search repo falcosecurity/falco --versions | head -5
NAME                CHART VERSION  APP VERSION
falcosecurity/falco 8.0.5          0.43.1
falcosecurity/falco 8.0.1          0.43.0
falcosecurity/falco 7.2.1          0.42.1

Same problem in the upgrade step (--version 0.44.0 is also an app version, not a chart version).

2. falcoctl-artifact-install init container crash-loops on restricted egress

By default the chart enables the falcoctl artifact downloader, which runs as the falcoctl-artifact-install init container and fetches https://falcosecurity.github.io/falcoctl/index.yaml at pod startup. On clusters with restricted or flaky egress the request hangs and the init container errors out:

{"level":"ERROR","msg":"unable to fetch index \"falcosecurity\" with URL \"https://falcosecurity.github.io/falcoctl/index.yaml\": ... dial tcp 185.199.108.153:443: i/o timeout"}

The pod stays in Init:CrashLoopBackOff. Falco ships its rules baked into the image, so the runtime fetch is optional. The upstream-documented fix (Rule basics for the Falco 3.0.0 Helm chart) is:

helm install falco \
  --set falcoctl.artifact.install.enabled=false \
  --set falcoctl.artifact.follow.enabled=false

This is upstream's documented "use the embedded ruleset" path and works on any cluster regardless of egress.

3. kubectl delete crd falcosecurity.org is a no-op

The previous uninstall step ran kubectl delete crd falcosecurity.org. Falco does not install any CRDs at that name (or any name). The chart cleans up its own resources on helm uninstall. Removed.

What this PR ships

Install (5 steps)

  1. Add the falcosecurity Helm repository.
  2. Install with chart 8.0.1 and both falcoctl artifact flags disabled so the embedded ruleset is used and the pod does not need github.io egress to start.
  3. Wait for the DaemonSet rollout (kubectl rollout status daemonset/falco -n falco) instead of just listing pods.
  4. Tail logs to confirm Falco initialized with configuration files and the BPF engine messages.
  5. Trigger a sample event by reading /etc/shadow inside a busybox pod, which fires the built-in Sensitive file opened for reading by non-trusted program rule. Confirms BPF probe is attached and embedded rules are loaded.

Uninstall (3 steps)

helm uninstall -> namespace delete -> verify no Falco pods, CRDs, or RBAC remain.

Upgrade (3 steps)

Backup DaemonSet -> helm upgrade --version 8.0.1 --reuse-values (so the falcoctl flags persist across upgrades) -> verify rollout.

Troubleshooting (6 entries)

  1. chart not found when passing the app version
  2. falcoctl-artifact-install CrashLoopBackOff with the documented fix
  3. Image-pull stalls on slow networks
  4. Falco running but no events (sample syscall + log grep)
  5. Falco pod OOMKilled (memory limit bump)
  6. BPF probe fails to attach (kmod fallback)

Metadata fixes

Validation

Local CI (mirror of every PR-blocking workflow)

== validate-schema ==
[PASS] validate-schema

== kb-quality-enforcement ==
Score: 100/100 ([PASS] OK)
Breakdown:
  - clarity: 100
  - completeness: 100
  - correctness: 100
  - structure: 100
  - observability: 100
[PASS] kb-quality-enforcement

== scan-missions ==
[PASS] scan-missions   (Schema clean, no sensitive data, no malicious content)

== mission-safety-scan ==
[PASS] mission-safety-scan   (all 14 grep rules clean)

== mission-content-validation (per-step) ==
[PASS] mission-content-validation (per-step)

== mission-content-validation (live URL + crane) ==
[PASS] mission-content-validation (live)
  https://falcosecurity.github.io/charts/index.yaml -> HTTP 200

== pr-verifier (conventional commit subject) ==
[PASS] pr-verifier

== copilot-dco (Signed-off-by trailer) ==
[PASS] copilot-dco

ALL LOCAL CI GATES PASSED. Safe to push.

End-to-end on kind

Cluster: kb-test (kind v0.31.0, Kubernetes 1.36.0).

# Reproduction of the falcoctl bug on the previous chart-version-correct path
$ helm install falco falcosecurity/falco --namespace falco --create-namespace --version 8.0.1
$ kubectl get pods -n falco
falco-c79rt   0/2   Init:CrashLoopBackOff   17 (2m9s ago)
$ kubectl logs -n falco falco-c79rt -c falcoctl-artifact-install --tail=5
{"level":"ERROR","msg":"unable to fetch index ... dial tcp 185.199.108.153:443: i/o timeout"}

# Apply the documented fix
$ helm upgrade falco falcosecurity/falco --namespace falco --version 8.0.1 \
    --set falcoctl.artifact.install.enabled=false \
    --set falcoctl.artifact.follow.enabled=false
Release "falco" has been upgraded.

$ kubectl rollout status daemonset/falco -n falco --timeout=300s
daemon set "falco" successfully rolled out

$ kubectl get pods -n falco -l app.kubernetes.io/name=falco
NAME          READY   STATUS    RESTARTS   AGE
falco-v4x2g   1/1     Running   0          81s

$ kubectl logs -n falco -l app.kubernetes.io/name=falco -c falco --tail=5
[libs]: Trying to open the right engine!
Falco initialized with configuration files
Starting health webserver with threadiness 1, listening on 0.0.0.0:8765

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • Documentation update

Checklist

  • I have signed off my commits (git commit -s)
  • I have updated documentation as needed
  • I have added tests that prove my fix/feature works
  • All new and existing tests pass

The "tests that prove my fix/feature works" box stays unticked because the repo has no unit-test framework for missions. The CI validators (schema, scan, quality, safety, content) ARE the tests; they're already covered by "All new and existing tests pass". The kind end-to-end run above is the practical equivalent.

The previous install-falco.json was unrunnable end to end on any cluster
the helm CLI could parse, and its install path crash-looped pods on every
cluster with restricted egress.

Two correctness bugs and one polish gap:

  1. helm install falco falcosecurity/falco --version 0.43.0 fails with
     "no chart version found for falco-0.43.0". 0.43.0 is the Falco
     application version. The Helm --version flag takes the chart
     version. The chart version that maps to app v0.43.0 is 8.0.1
     (helm search repo falcosecurity/falco --versions). Same problem
     in the upgrade step (--version 0.44.0 is also an app version).

  2. The default chart enables the falcoctl artifact downloader, which
     runs as the falcoctl-artifact-install init container and fetches
     https://falcosecurity.github.io/falcoctl/index.yaml at pod
     startup. On clusters with restricted or flaky egress (kind, k3d,
     air-gapped, corporate networks blocking github.io) the request
     hangs and the init container errors with
     "dial tcp ...:443: i/o timeout". The pod stays in
     Init:CrashLoopBackOff. Falco ships its rules baked into the
     image, so the runtime fetch is optional. The upstream-documented
     fix (https://falco.org/blog/rules-helm-chart-3-0-0/) is to set
     falcoctl.artifact.install.enabled=false and
     falcoctl.artifact.follow.enabled=false. Reproduced and fixed
     end-to-end on a kind cluster.

  3. The uninstall step ran "kubectl delete crd falcosecurity.org",
     which fails because Falco does not install any CRDs. The chart
     cleans up its own resources on helm uninstall. Removed.

This rewrite addresses all three:

  - Step 2 installs chart 8.0.1 (app v0.43.0) with both falcoctl
    artifact flags disabled, so the embedded ruleset is used and the
    pod does not need github.io egress to start.
  - Step 3 waits for the DaemonSet rollout instead of just listing
    pods, so the verification fails fast if the BPF probe cannot
    attach.
  - Step 5 generates a known-noisy syscall (cat /etc/shadow inside a
    busybox pod) and greps for the corresponding "Sensitive file
    opened for reading by non-trusted program" Warning, which proves
    the BPF probe is wired up and the embedded ruleset is loaded.
  - Uninstall is split into helm uninstall, namespace delete, and a
    final pods/CRDs/RBAC verification.
  - Upgrade uses --reuse-values so the falcoctl flags stay disabled
    across upgrades (otherwise the next chart bump silently re-enables
    the runtime downloader and the pod starts crash-looping again).
  - Six concrete troubleshooting entries: chart-version mismatch,
    falcoctl artifact-install crash loop with the documented fix,
    image-pull stalls on slow networks, no-events sanity test,
    OOMKilled, and the kmod fallback when modern-bpf cannot attach.
  - metadata.containerImages now lists the three real images the
    chart 8.0.1 actually pulls (falco, falco-driver-loader, falcoctl)
    instead of the single placeholder reference.

Validated end to end on a kind cluster (kb-test, kind v0.31.0,
Kubernetes 1.36.0):

  - helm install ... --version 8.0.1 with both falcoctl artifact flags
    disabled: STATUS deployed
  - kubectl rollout status daemonset/falco -n falco: rolled out
  - kubectl get pods -n falco: falco-... 1/1 Running 0 (was
    Init:CrashLoopBackOff before the falcoctl flags were added)
  - kubectl logs ... -c falco --tail=50: "Falco initialized with
    configuration files" + libbpf engine messages + "Events
    detected: N"
  - kubectl run falco-test ... cat /etc/shadow: triggered the
    "Sensitive file opened for reading by non-trusted program"
    Warning event in the Falco logs

Local CI parity (scripts/local-ci.sh):

  - validate-schema -> Valid kc-mission-v1
  - kb-quality-enforcement -> 100/100 (clarity, completeness,
    correctness, structure, observability all 100)
  - scan-missions -> Schema clean, no sensitive data, no malicious
    content
  - mission-safety-scan -> all 14 grep rules clean
  - mission-content-validation (per-step) -> every step has a code
    block, no orphan kubectl edit deployment, no orphan kubectl apply
    -f local-file
  - mission-content-validation (live) -> Helm repo
    https://falcosecurity.github.io/charts/index.yaml -> HTTP 200

Signed-off-by: bmvinay7 <vinaybm1234@gmail.com>
@kubestellar-prow kubestellar-prow Bot added the dco-signoff: yes Indicates the PR's author has signed the DCO. label May 21, 2026
@kubestellar-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign clubanderson for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubestellar-prow kubestellar-prow Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 21, 2026
Copy link
Copy Markdown
Contributor

@kubestellar-hive kubestellar-hive Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — thorough fix for the Falco install mission.

@kubestellar-prow
Copy link
Copy Markdown
Contributor

@kubestellar-hive[bot]: changing LGTM is restricted to collaborators

Details

In response to this:

LGTM — thorough fix for the Falco install mission.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubestellar-hive kubestellar-hive Bot merged commit 4d74e56 into kubestellar:master May 21, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the DCO. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant