Skip to content

Feat/fallback helm chart#8

Open
vivek-tfy wants to merge 30 commits intomainfrom
feat/fallback-helm-chart
Open

Feat/fallback helm chart#8
vivek-tfy wants to merge 30 commits intomainfrom
feat/fallback-helm-chart

Conversation

@vivek-tfy
Copy link
Copy Markdown
Collaborator

@vivek-tfy vivek-tfy commented Feb 10, 2026

Note

Medium Risk
Adds new CI publishing and a full Helm/Istio deployment surface (incl. OAuth/JWT enforcement), which can impact release/deploy behavior and cluster access controls if misconfigured.

Overview
Introduces a new GitHub Actions workflow (.github/workflows/ci.yml) to build the existing Dockerfile and push images to GHCR on releases, manual dispatch, and pushes to main/feat/fallback-helm-*.

Adds a complete Helm chart (helm/spark-job-fallback-management) to deploy the service with a Deployment, Service, PVC-backed /app/data, and optional Istio VirtualService exposure.

When auth.enabled is set, the chart also provisions Istio RequestAuthentication, AuthorizationPolicy, and an EnvoyFilter to enforce TrueFoundry OAuth2 + JWT auth; env.default is added to capture default runtime settings used for chart value generation.

Written by Cursor Bugbot for commit 918fa88. This will update automatically on new commits. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Free Tier Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: API token exposed as plain text in deployment
    • I moved TF_SERVICE_API_TOKEN to a dedicated Kubernetes Secret template and now inject it in the Deployment via valueFrom.secretKeyRef instead of plain-text value.
  • ✅ Fixed: ReadWriteOnce PVC incompatible with multiple replicas
    • I made PVC access mode configurable and added a Helm template validation that fails when replicaCount > 1 with ReadWriteOnce to prevent invalid multi-replica mounts.
  • ✅ Fixed: Unused serviceAccount helper and values configuration
    • I added a conditional serviceaccount.yaml template and wired serviceAccountName into the Deployment so the existing helper and values now take effect.

Create PR

Or push these changes by commenting:

@cursor push fdce861de0
Preview (fdce861de0)
diff --git a/helm/spark-job-fallback-management/templates/deployment.yaml b/helm/spark-job-fallback-management/templates/deployment.yaml
--- a/helm/spark-job-fallback-management/templates/deployment.yaml
+++ b/helm/spark-job-fallback-management/templates/deployment.yaml
@@ -25,6 +25,7 @@
       imagePullSecrets:
         {{- toYaml . | nindent 8 }}
       {{- end }}
+      serviceAccountName: {{ include "spark-job-fallback-management.serviceAccountName" . }}
       securityContext:
         {{- toYaml .Values.podSecurityContext | nindent 8 }}
       containers:
@@ -50,7 +51,10 @@
               value: {{ .Values.truefoundryApi.url | quote }}
             {{- if .Values.truefoundryApi.serviceApiToken }}
             - name: TF_SERVICE_API_TOKEN
-              value: {{ .Values.truefoundryApi.serviceApiToken | quote }}
+              valueFrom:
+                secretKeyRef:
+                  name: {{ include "spark-job-fallback-management.fullname" . }}-truefoundry-api
+                  key: service-api-token
             {{- end }}
             - name: JOB_FALLBACK_ENABLED
               value: {{ .Values.jobFallback.enabled | quote }}

diff --git a/helm/spark-job-fallback-management/templates/pvc.yaml b/helm/spark-job-fallback-management/templates/pvc.yaml
--- a/helm/spark-job-fallback-management/templates/pvc.yaml
+++ b/helm/spark-job-fallback-management/templates/pvc.yaml
@@ -1,3 +1,6 @@
+{{- if and (gt (int .Values.replicaCount) 1) (eq .Values.persistence.accessMode "ReadWriteOnce") -}}
+{{- fail "replicaCount > 1 requires a shared PVC access mode; set persistence.accessMode to ReadWriteMany or keep replicaCount=1" -}}
+{{- end }}
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -6,7 +9,7 @@
     {{- include "spark-job-fallback-management.labels" . | nindent 4 }}
 spec:
   accessModes:
-    - ReadWriteOnce
+    - {{ .Values.persistence.accessMode }}
   resources:
     requests:
       storage: {{ .Values.persistence.size }}

diff --git a/helm/spark-job-fallback-management/templates/secret.yaml b/helm/spark-job-fallback-management/templates/secret.yaml
new file mode 100644
--- /dev/null
+++ b/helm/spark-job-fallback-management/templates/secret.yaml
@@ -1,0 +1,11 @@
+{{- if .Values.truefoundryApi.serviceApiToken }}
+apiVersion: v1
+kind: Secret
+metadata:
+  name: {{ include "spark-job-fallback-management.fullname" . }}-truefoundry-api
+  labels:
+    {{- include "spark-job-fallback-management.labels" . | nindent 4 }}
+type: Opaque
+stringData:
+  service-api-token: {{ .Values.truefoundryApi.serviceApiToken | quote }}
+{{- end }}

diff --git a/helm/spark-job-fallback-management/templates/serviceaccount.yaml b/helm/spark-job-fallback-management/templates/serviceaccount.yaml
new file mode 100644
--- /dev/null
+++ b/helm/spark-job-fallback-management/templates/serviceaccount.yaml
@@ -1,0 +1,12 @@
+{{- if .Values.serviceAccount.create }}
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: {{ include "spark-job-fallback-management.serviceAccountName" . }}
+  labels:
+    {{- include "spark-job-fallback-management.labels" . | nindent 4 }}
+  {{- with .Values.serviceAccount.annotations }}
+  annotations:
+    {{- toYaml . | nindent 4 }}
+  {{- end }}
+{{- end }}

diff --git a/helm/spark-job-fallback-management/values.yaml b/helm/spark-job-fallback-management/values.yaml
--- a/helm/spark-job-fallback-management/values.yaml
+++ b/helm/spark-job-fallback-management/values.yaml
@@ -74,6 +74,8 @@
 
 # Persistent storage (auto-created by Helm)
 persistence:
+  # PVC access mode (ReadWriteOnce requires replicaCount=1)
+  accessMode: ReadWriteOnce
   # Storage size for /app/data
   size: 1Gi
   # Storage class (leave empty for cluster default)
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

{{- if .Values.truefoundryApi.serviceApiToken }}
- name: TF_SERVICE_API_TOKEN
value: {{ .Values.truefoundryApi.serviceApiToken | quote }}
{{- end }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API token exposed as plain text in deployment

High Severity

TF_SERVICE_API_TOKEN is injected as a plain-text value in the deployment spec rather than being stored in a Kubernetes Secret and referenced via secretKeyRef. This means the token is visible to anyone with read access to deployments (kubectl get deploy -o yaml), stored unencrypted in Helm release secrets, and potentially logged in CI/CD pipelines.

Fix in Cursor Fix in Web

{{- include "spark-job-fallback-management.labels" . | nindent 4 }}
spec:
accessModes:
- ReadWriteOnce
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadWriteOnce PVC incompatible with multiple replicas

Medium Severity

The PVC is created with ReadWriteOnce access mode, but the deployment's replicaCount is configurable via values. If replicaCount is set to more than 1 and pods land on different nodes, the additional pods will fail to start because a ReadWriteOnce volume can only be mounted by a single node.

Additional Locations (1)

Fix in Cursor Fix in Web

{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused serviceAccount helper and values configuration

Medium Severity

The spark-job-fallback-management.serviceAccountName helper is defined in _helpers.tpl and serviceAccount values are declared in values.yaml, but neither is referenced by any template. There is no serviceaccount.yaml template, and the deployment doesn't set serviceAccountName. The serviceAccount.create setting has no effect, which could mislead users into thinking a service account is being managed.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

{{- $tfyHost := .Values.auth.truefoundryInternalHost -}}
{{- $clientId := required "auth.clientId is required when auth is enabled" .Values.auth.clientId -}}
{{- $sdsUri := .Values.auth.sdsServerUri -}}
{{- $port := .Values.service.port -}}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvoyFilter vhost uses service port instead of container port

Medium Severity

$port is derived from .Values.service.port, but Istio names inbound virtual hosts using the container port, not the service port. The deployment hardcodes containerPort: 8000, so the vhost name inbound|http|{{ $port }} only works when service.port happens to equal 8000. If service.port is overridden to any other value (e.g., 80), the EnvoyFilter won't match the actual Istio-generated vhost, silently disabling OAuth2 authentication.

Additional Locations (1)

Fix in Cursor Fix in Web

# Annotations to add to the service account
annotations: {}
# The name of the service account to use
name: ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serviceAccount values defined but have no effect

Low Severity

The serviceAccount configuration block in values.yaml and the serviceAccountName helper in _helpers.tpl exist but are entirely unused — no serviceaccount.yaml template creates the resource, and the deployment template never sets serviceAccountName. Setting serviceAccount.create: true has no effect, which could confuse users.

Fix in Cursor Fix in Web

# Image pull policy
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion
tag: "latest"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable tag with IfNotPresent prevents image updates

Medium Severity

The default image.tag is "latest" (a mutable tag) but pullPolicy is IfNotPresent. This combination means once a node caches the latest image, Kubernetes won't pull a newer version even when CI pushes an updated image with the same latest tag. Additionally, since the Deployment spec never changes (always referencing repo:latest), no automatic rollout is triggered on new image pushes. Together this silently prevents new code from being deployed.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant