Skip to content

feat: sysbox helm chart#71

Open
endor wants to merge 1 commit into
mainfrom
feat/sysbox-helm-chart
Open

feat: sysbox helm chart#71
endor wants to merge 1 commit into
mainfrom
feat/sysbox-helm-chart

Conversation

@endor
Copy link
Copy Markdown
Collaborator

@endor endor commented May 22, 2026

Summary

Added a helm chart to deploy the sandbox service. This helm chart is used as a dependency in another chart that deploys to our dev cluster. Customers would likely choose a similar setup, and create resources that are custom to their setup in the wrapper chart that uses this chart as a dependency.


Summary by cubic

Add a production-ready Helm chart to deploy the sandbox API with an optional in-cluster sysbox/DinD runner, plus CI lint/render checks and better operational logging. This makes it easy to run with mTLS, persistence, ingress, and either in-cluster or external runners.

  • New Features

    • New Helm chart charts/n8n-sandbox-service deploys the API and optional sysbox runner; supports dataPlane.mode sysbox or external.
    • mTLS via tls.mode=existingSecret or certManager (auto Certificates with issuerRef); mounts Secrets and wires TLS env vars.
    • Optional Ingress, PVC-backed API storage, headless runner Service + StatefulSet, and toggleable NetworkPolicy.
    • Sysbox scheduling knobs (RuntimeClass, hostUsers, node selectors/tolerations) and Docker storage driver override in start-runner.sh.
    • CI adds a Helm lint/template job and includes it in required checks.
    • New scripts/smoke-dev-sandbox.sh for quick API sanity; richer logs for runner registration and sandbox lifecycle.
  • Migration

    • Provide an auth Secret via auth.existingSecret, or set non-placeholder auth.generated.* (chart fails on placeholders).
    • Choose TLS mode: set tls.certManager.issuerRef for certManager, or create the required Secrets for existingSecret.
    • Keep api.replicaCount: 1; if enabling the sysbox runner, use a sysbox-ready node pool and adjust sysboxRunner.scheduling; use dataPlane.mode=external for external runners.
    • Merges are now gated by the new Helm job in CI.

Written for commit 09984d3. Summary will update on new commits. Review in cubic

@endor endor requested review from ahsanv and tomi May 22, 2026 13:38
@cla-bot cla-bot Bot added the cla-signed label May 22, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 23 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="charts/n8n-sandbox-service/values.yaml">

<violation number="1" location="charts/n8n-sandbox-service/values.yaml:86">
P2: Image tag uses `latest`. Deploy not reproducible. Pin immutable version tag (or digest).</violation>
</file>

<file name="scripts/start-runner.sh">

<violation number="1" location="scripts/start-runner.sh:6">
P2: `DOCKERD_CONFIG_FILE` override not hooked to dockerd start. Config can be written to custom path, then ignored. Wire it into startup or remove override knob.</violation>
</file>

<file name="charts/n8n-sandbox-service/templates/networkpolicy.yaml">

<violation number="1" location="charts/n8n-sandbox-service/templates/networkpolicy.yaml:16">
P1: gRPC rule opens too wide when no peers set. No `from` means allow all sources on `grpc`. Wrap this ingress item in the condition so the port is denied unless runner/default peers exist.</violation>
</file>

<file name="scripts/smoke-dev-sandbox.sh">

<violation number="1" location="scripts/smoke-dev-sandbox.sh:43">
P1: Sandbox cleanup missing on error path. Fail after create leaks sandbox. Delete in trap too.</violation>

<violation number="2" location="scripts/smoke-dev-sandbox.sh:56">
P2: Secret key lookup fragile. Dot/dash key names can break jsonpath. Use indexed access for `.data`.</violation>
</file>

<file name="charts/n8n-sandbox-service/templates/sysbox-runner-statefulset.yaml">

<violation number="1" location="charts/n8n-sandbox-service/templates/sysbox-runner-statefulset.yaml:47">
P1: Container ports hardcoded. Service targetPort points here, so custom runner listen ports can break traffic. Wire ports from values consistently.</violation>

<violation number="2" location="charts/n8n-sandbox-service/templates/sysbox-runner-statefulset.yaml:104">
P1: Probe port hardcoded to 8080. Health checks can fail when runner listen port is changed. Use chart value, not fixed port.</violation>
</file>

Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic

Comment on lines +16 to +27
- ports:
- port: http
protocol: TCP
{{- with .Values.networkPolicy.api.httpIngressFrom }}
from:
{{- toYaml . | nindent 8 }}
{{- end }}
- ports:
- port: grpc
protocol: TCP
{{- if or (and .Values.sysboxRunner.enabled (eq .Values.dataPlane.mode "sysbox")) .Values.networkPolicy.api.grpcIngressFrom }}
from:
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: gRPC rule opens too wide when no peers set. No from means allow all sources on grpc. Wrap this ingress item in the condition so the port is denied unless runner/default peers exist.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/n8n-sandbox-service/templates/networkpolicy.yaml, line 16:

<comment>gRPC rule opens too wide when no peers set. No `from` means allow all sources on `grpc`. Wrap this ingress item in the condition so the port is denied unless runner/default peers exist.</comment>

<file context>
@@ -0,0 +1,66 @@
+  policyTypes:
+    - Ingress
+  ingress:
+    - ports:
+        - port: http
+          protocol: TCP
</file context>
Suggested change
- ports:
- port: http
protocol: TCP
{{- with .Values.networkPolicy.api.httpIngressFrom }}
from:
{{- toYaml . | nindent 8 }}
{{- end }}
- ports:
- port: grpc
protocol: TCP
{{- if or (and .Values.sysboxRunner.enabled (eq .Values.dataPlane.mode "sysbox")) .Values.networkPolicy.api.grpcIngressFrom }}
from:
{{- if or (and .Values.sysboxRunner.enabled (eq .Values.dataPlane.mode "sysbox")) .Values.networkPolicy.api.grpcIngressFrom }}
- ports:
- port: grpc
protocol: TCP
from:
{{- if and .Values.sysboxRunner.enabled (eq .Values.dataPlane.mode "sysbox") }}
- podSelector:
matchLabels:
{{- include "n8n-sandbox-service.selectorLabels" (dict "Release" .Release "Chart" .Chart "Values" .Values "component" "sysbox-runner") | nindent 14 }}
{{- end }}
{{- with .Values.networkPolicy.api.grpcIngressFrom }}
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
Fix with Cubic

exit 1
fi

cleanup() {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Sandbox cleanup missing on error path. Fail after create leaks sandbox. Delete in trap too.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/smoke-dev-sandbox.sh, line 43:

<comment>Sandbox cleanup missing on error path. Fail after create leaks sandbox. Delete in trap too.</comment>

<file context>
@@ -0,0 +1,98 @@
+	exit 1
+fi
+
+cleanup() {
+	if [ -n "${tmp_exec_out:-}" ]; then
+		rm -f "${tmp_exec_out}"
</file context>
Fix with Cubic

imagePullPolicy: {{ .Values.sysboxRunner.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Container ports hardcoded. Service targetPort points here, so custom runner listen ports can break traffic. Wire ports from values consistently.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/n8n-sandbox-service/templates/sysbox-runner-statefulset.yaml, line 47:

<comment>Container ports hardcoded. Service targetPort points here, so custom runner listen ports can break traffic. Wire ports from values consistently.</comment>

<file context>
@@ -0,0 +1,186 @@
+          imagePullPolicy: {{ .Values.sysboxRunner.image.pullPolicy }}
+          ports:
+            - name: http
+              containerPort: 8080
+            - name: control-grpc
+              containerPort: 9091
</file context>
Fix with Cubic

command:
- /bin/sh
- -ec
- 'wget -q -O /dev/null --header="X-Api-Key: ${SANDBOX_RUNNER_API_KEYS%%,*}" http://127.0.0.1:8080/healthz'
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Probe port hardcoded to 8080. Health checks can fail when runner listen port is changed. Use chart value, not fixed port.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/n8n-sandbox-service/templates/sysbox-runner-statefulset.yaml, line 104:

<comment>Probe port hardcoded to 8080. Health checks can fail when runner listen port is changed. Use chart value, not fixed port.</comment>

<file context>
@@ -0,0 +1,186 @@
+              command:
+                - /bin/sh
+                - -ec
+                - 'wget -q -O /dev/null --header="X-Api-Key: ${SANDBOX_RUNNER_API_KEYS%%,*}" http://127.0.0.1:8080/healthz'
+            initialDelaySeconds: 20
+            periodSeconds: 10
</file context>
Fix with Cubic


image:
repository: n8nio/n8n-sandbox-api
tag: latest
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Image tag uses latest. Deploy not reproducible. Pin immutable version tag (or digest).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At charts/n8n-sandbox-service/values.yaml, line 86:

<comment>Image tag uses `latest`. Deploy not reproducible. Pin immutable version tag (or digest).</comment>

<file context>
@@ -0,0 +1,192 @@
+
+  image:
+    repository: n8nio/n8n-sandbox-api
+    tag: latest
+    pullPolicy: IfNotPresent
+
</file context>
Fix with Cubic

Comment thread scripts/start-runner.sh

DOCKERD_LOG=${DOCKERD_LOG:-/tmp/dockerd.log}
DOCKERD_CONFIG_DIR=${DOCKERD_CONFIG_DIR:-/etc/docker}
DOCKERD_CONFIG_FILE=${DOCKERD_CONFIG_FILE:-${DOCKERD_CONFIG_DIR}/daemon.json}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: DOCKERD_CONFIG_FILE override not hooked to dockerd start. Config can be written to custom path, then ignored. Wire it into startup or remove override knob.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/start-runner.sh, line 6:

<comment>`DOCKERD_CONFIG_FILE` override not hooked to dockerd start. Config can be written to custom path, then ignored. Wire it into startup or remove override knob.</comment>

<file context>
@@ -2,6 +2,8 @@
 
 DOCKERD_LOG=${DOCKERD_LOG:-/tmp/dockerd.log}
+DOCKERD_CONFIG_DIR=${DOCKERD_CONFIG_DIR:-/etc/docker}
+DOCKERD_CONFIG_FILE=${DOCKERD_CONFIG_FILE:-${DOCKERD_CONFIG_DIR}/daemon.json}
 
 INSECURE_ARGS=""
</file context>
Fix with Cubic

NAMESPACE="${KUBE_NAMESPACE:?KUBE_NAMESPACE is required when SANDBOX_API_KEY is not set}"
AUTH_SECRET="${KUBE_AUTH_SECRET:?KUBE_AUTH_SECRET is required when SANDBOX_API_KEY is not set}"
AUTH_SECRET_KEY="${KUBE_AUTH_SECRET_KEY:?KUBE_AUTH_SECRET_KEY is required when SANDBOX_API_KEY is not set}"
KEY="$(kubectl -n "${NAMESPACE}" get secret "${AUTH_SECRET}" -o "jsonpath={.data.${AUTH_SECRET_KEY}}" | base64 -d | cut -d, -f1)"
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Secret key lookup fragile. Dot/dash key names can break jsonpath. Use indexed access for .data.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At scripts/smoke-dev-sandbox.sh, line 56:

<comment>Secret key lookup fragile. Dot/dash key names can break jsonpath. Use indexed access for `.data`.</comment>

<file context>
@@ -0,0 +1,98 @@
+	NAMESPACE="${KUBE_NAMESPACE:?KUBE_NAMESPACE is required when SANDBOX_API_KEY is not set}"
+	AUTH_SECRET="${KUBE_AUTH_SECRET:?KUBE_AUTH_SECRET is required when SANDBOX_API_KEY is not set}"
+	AUTH_SECRET_KEY="${KUBE_AUTH_SECRET_KEY:?KUBE_AUTH_SECRET_KEY is required when SANDBOX_API_KEY is not set}"
+	KEY="$(kubectl -n "${NAMESPACE}" get secret "${AUTH_SECRET}" -o "jsonpath={.data.${AUTH_SECRET_KEY}}" | base64 -d | cut -d, -f1)"
+	if [ -z "${KEY}" ]; then
+		echo "could not read API key from secret/${AUTH_SECRET} key ${AUTH_SECRET_KEY}" >&2
</file context>
Fix with Cubic

Copy link
Copy Markdown
Contributor

@tomi tomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff 👏

The image names should be updated, as they were recently changed. Also codex flagged these:

NetworkPolicy opens registration gRPC in external mode

With networkPolicy.enabled=true and dataPlane.mode=external, leaving networkPolicy.api.grpcIngressFrom empty renders the gRPC ingress rule without from, which allows all sources to the private registration port.

File: networkpolicy.yaml:23

API listen port overrides break Service/probes

api.config.listenAddr and grpcListenAddr are written to the ConfigMap, but the Deployment still names container ports 8080 and 9090. If either value is overridden, the Service targetPorts and health probes keep hitting the old ports.

File: api-deployment.yaml:36

Runner advertises Service ports for headless pod DNS

The runner advertises $(POD_NAME)....:. Headless Services resolve directly to pod IPs and do not translate service.port to targetPort, so overriding runner service ports or listen ports makes the API dial the wrong port.

File: sysbox-runner-statefulset.yaml:66

Smoke test leaks sandbox on failures after create

The cleanup trap only removes the temp file. With set -e, any failure after sandbox creation and before the final DELETE leaves the dev sandbox running.

File: smoke-dev-sandbox.sh:43

replicaCount: 1

image:
repository: n8nio/n8n-sandbox-api
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct image is n8nio/n8n-sandbox-service-api

replicaCount: 1

image:
repository: n8nio/n8n-sandbox-runner
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct image is n8nio/n8n-sandbox-service-runner-dind

pullPolicy: IfNotPresent

sandboxImage:
repository: n8nio/n8n-sandbox
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct image is n8nio/n8n-sandbox-service-sandbox

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants