fix(cncf-install): use Falco chart 8.0.1, disable falcoctl downloader#2306
Conversation
The previous install-falco.json was unrunnable end to end on any cluster
the helm CLI could parse, and its install path crash-looped pods on every
cluster with restricted egress.
Two correctness bugs and one polish gap:
1. helm install falco falcosecurity/falco --version 0.43.0 fails with
"no chart version found for falco-0.43.0". 0.43.0 is the Falco
application version. The Helm --version flag takes the chart
version. The chart version that maps to app v0.43.0 is 8.0.1
(helm search repo falcosecurity/falco --versions). Same problem
in the upgrade step (--version 0.44.0 is also an app version).
2. The default chart enables the falcoctl artifact downloader, which
runs as the falcoctl-artifact-install init container and fetches
https://falcosecurity.github.io/falcoctl/index.yaml at pod
startup. On clusters with restricted or flaky egress (kind, k3d,
air-gapped, corporate networks blocking github.io) the request
hangs and the init container errors with
"dial tcp ...:443: i/o timeout". The pod stays in
Init:CrashLoopBackOff. Falco ships its rules baked into the
image, so the runtime fetch is optional. The upstream-documented
fix (https://falco.org/blog/rules-helm-chart-3-0-0/) is to set
falcoctl.artifact.install.enabled=false and
falcoctl.artifact.follow.enabled=false. Reproduced and fixed
end-to-end on a kind cluster.
3. The uninstall step ran "kubectl delete crd falcosecurity.org",
which fails because Falco does not install any CRDs. The chart
cleans up its own resources on helm uninstall. Removed.
This rewrite addresses all three:
- Step 2 installs chart 8.0.1 (app v0.43.0) with both falcoctl
artifact flags disabled, so the embedded ruleset is used and the
pod does not need github.io egress to start.
- Step 3 waits for the DaemonSet rollout instead of just listing
pods, so the verification fails fast if the BPF probe cannot
attach.
- Step 5 generates a known-noisy syscall (cat /etc/shadow inside a
busybox pod) and greps for the corresponding "Sensitive file
opened for reading by non-trusted program" Warning, which proves
the BPF probe is wired up and the embedded ruleset is loaded.
- Uninstall is split into helm uninstall, namespace delete, and a
final pods/CRDs/RBAC verification.
- Upgrade uses --reuse-values so the falcoctl flags stay disabled
across upgrades (otherwise the next chart bump silently re-enables
the runtime downloader and the pod starts crash-looping again).
- Six concrete troubleshooting entries: chart-version mismatch,
falcoctl artifact-install crash loop with the documented fix,
image-pull stalls on slow networks, no-events sanity test,
OOMKilled, and the kmod fallback when modern-bpf cannot attach.
- metadata.containerImages now lists the three real images the
chart 8.0.1 actually pulls (falco, falco-driver-loader, falcoctl)
instead of the single placeholder reference.
Validated end to end on a kind cluster (kb-test, kind v0.31.0,
Kubernetes 1.36.0):
- helm install ... --version 8.0.1 with both falcoctl artifact flags
disabled: STATUS deployed
- kubectl rollout status daemonset/falco -n falco: rolled out
- kubectl get pods -n falco: falco-... 1/1 Running 0 (was
Init:CrashLoopBackOff before the falcoctl flags were added)
- kubectl logs ... -c falco --tail=50: "Falco initialized with
configuration files" + libbpf engine messages + "Events
detected: N"
- kubectl run falco-test ... cat /etc/shadow: triggered the
"Sensitive file opened for reading by non-trusted program"
Warning event in the Falco logs
Local CI parity (scripts/local-ci.sh):
- validate-schema -> Valid kc-mission-v1
- kb-quality-enforcement -> 100/100 (clarity, completeness,
correctness, structure, observability all 100)
- scan-missions -> Schema clean, no sensitive data, no malicious
content
- mission-safety-scan -> all 14 grep rules clean
- mission-content-validation (per-step) -> every step has a code
block, no orphan kubectl edit deployment, no orphan kubectl apply
-f local-file
- mission-content-validation (live) -> Helm repo
https://falcosecurity.github.io/charts/index.yaml -> HTTP 200
Signed-off-by: bmvinay7 <vinaybm1234@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@kubestellar-hive[bot]: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
4d74e56
into
kubestellar:master
Description
Rewrites
fixes/cncf-install/install-falco.jsonso it actually installs Falco end to end, including on clusters with restricted egress (kind, k3d, air-gapped, corporate networks blocking github.io). The previous mission was unrunnable on any current chart and crash-looped pods on the most common kind setup.Bugs in the previous mission
1.
helm install ... --version 0.43.0fails withchart not found0.43.0is the Falco app version. The Helm--versionflag takes the chart version. Chart8.0.1maps to appv0.43.0:Same problem in the upgrade step (
--version 0.44.0is also an app version, not a chart version).2.
falcoctl-artifact-installinit container crash-loops on restricted egressBy default the chart enables the falcoctl artifact downloader, which runs as the
falcoctl-artifact-installinit container and fetcheshttps://falcosecurity.github.io/falcoctl/index.yamlat pod startup. On clusters with restricted or flaky egress the request hangs and the init container errors out:The pod stays in
Init:CrashLoopBackOff. Falco ships its rules baked into the image, so the runtime fetch is optional. The upstream-documented fix (Rule basics for the Falco 3.0.0 Helm chart) is:This is upstream's documented "use the embedded ruleset" path and works on any cluster regardless of egress.
3.
kubectl delete crd falcosecurity.orgis a no-opThe previous uninstall step ran
kubectl delete crd falcosecurity.org. Falco does not install any CRDs at that name (or any name). The chart cleans up its own resources onhelm uninstall. Removed.What this PR ships
Install (5 steps)
falcosecurityHelm repository.kubectl rollout status daemonset/falco -n falco) instead of just listing pods.Falco initialized with configuration filesand the BPF engine messages./etc/shadowinside a busybox pod, which fires the built-inSensitive file opened for reading by non-trusted programrule. Confirms BPF probe is attached and embedded rules are loaded.Uninstall (3 steps)
helm uninstall-> namespace delete -> verify no Falco pods, CRDs, or RBAC remain.Upgrade (3 steps)
Backup DaemonSet ->
helm upgrade --version 8.0.1 --reuse-values(so the falcoctl flags persist across upgrades) -> verify rollout.Troubleshooting (6 entries)
chart not foundwhen passing the app versionfalcoctl-artifact-install CrashLoopBackOffwith the documented fixMetadata fixes
containerImagesswitched from a single placeholder ref to the three real images chart 8.0.1 actually pulls:docker.io/falcosecurity/falco:0.43.0,docker.io/falcosecurity/falco-driver-loader:0.43.0,docker.io/falcosecurity/falcoctl:0.12.2. Verified withdocker manifest inspect.metadata.sourceUrls.helmadded.authorGithubswitched toVinay B M/bmvinay7for the rewrite, matching precedent from PR Fix Thanos install mission and correct author attribution #2253 (Thanos), fix(platform-install): rewrite argocd-operator install mission to use upstream OLM path #2299 (argocd-operator), and the auto-merged 🐛 Fix platform-kyverno mission: namespace, chart version, labels, uninstall safety #2305 (Kyverno).Validation
Local CI (mirror of every PR-blocking workflow)
End-to-end on kind
Cluster:
kb-test(kind v0.31.0, Kubernetes 1.36.0).Type of Change
Checklist
git commit -s)The "tests that prove my fix/feature works" box stays unticked because the repo has no unit-test framework for missions. The CI validators (schema, scan, quality, safety, content) ARE the tests; they're already covered by "All new and existing tests pass". The kind end-to-end run above is the practical equivalent.