Skip to content

fix: force rollout on same-version app redeploy#18915

Open
martinothamar-agent wants to merge 1 commit into
Altinn:mainfrom
martinothamar-agent:fix/same-version-redeploy-rollout
Open

fix: force rollout on same-version app redeploy#18915
martinothamar-agent wants to merge 1 commit into
Altinn:mainfrom
martinothamar-agent:fix/same-version-redeploy-rollout

Conversation

@martinothamar-agent
Copy link
Copy Markdown
Contributor

@martinothamar-agent martinothamar-agent commented May 21, 2026

Description

  • Pass the persisted Designer deployment sequence number to the Azure DevOps deploy pipeline as DEPLOYMENT_ID.
  • Include DEPLOYMENT_ID in generated Helm values as podAnnotations.altinn.studio/deployment-id.
  • Require DEPLOYMENT_ID for app OCI deployments so same-version redeploys cannot silently miss the rollout marker.

Previously, retrying deployment of the same release could leave the rendered runtime Deployment unchanged: the image tag stayed the same, and the Azure DevOps build id only appeared on HelmRelease metadata. After a Helm/Flux rollback, refreshing the app config artifact was therefore not guaranteed to produce a new Kubernetes pod template. The deployment id annotation is part of spec.values, and the deployment chart renders podAnnotations into the Kubernetes Deployment pod template, so every logical Designer deployment attempt creates a rollout-relevant desired-state change even when the app version/tag is unchanged.

I also inspected the frontend deployment status logic. It keeps PipelineSucceeded as in-progress for normal deployments for 15 minutes unless a final Flux event arrives, but after that it can still show success based only on the pipeline event. I left that out of this PR because it is a separate UX/status semantics change from forcing the actual rollout.

Verification

  • Related issues are connected (if applicable)
  • Your code builds clean without any errors or warnings
  • Manual testing done (required)
  • Relevant automated test added (if you find this hard, leave it and we'll help out)

Automated:

$ dotnet test tests/Designer.Tests/Designer.Tests.csproj --filter "FullyQualifiedName~DeploymentServiceTest"
Passed!  - Failed:     0, Passed:    19, Skipped:     0, Total:    19, Duration: 322 ms - Altinn.Studio.Designer.Tests.dll (net9.0)

Formatting/YAML checks:

$ dotnet tool restore
Restore was successful.

$ dotnet tool run csharpier check src/Designer/Services/Implementation/DeploymentService.cs src/Designer/TypedHttpClients/AzureDevOps/Models/QueueBuildParameters.cs tests/Designer.Tests/Services/DeploymentServiceTest.cs
Checked 3 files in 1117ms.

$ yq . src/App/azure-pipelines/deploy-app.yaml >/dev/null
# exit 0

$ git diff --check
# exit 0

Manual HelmRelease values verification:

I mirrored the pipeline merge with mikefarah/yq:4.45.1 using normalized values containing an existing pod annotation and overrides containing altinn.studio/deployment-id: "42". The resulting HelmRelease spec.values.podAnnotations was:

existing.example/annotation: keep-me
altinn.studio/deployment-id: "42"

This verifies the generated HelmRelease values include:

podAnnotations:
  altinn.studio/deployment-id: "<deployment sequence no>"

cc @martinothamar

Summary by CodeRabbit

Release Notes

New Features

  • Introduced deployment ID tracking throughout the deployment pipeline for improved identification and traceability of deployment attempts.
  • Deployment identifiers are now automatically included in pod annotations for enhanced observability and monitoring capabilities.
  • Pipeline validation now ensures deployment IDs are provided when pushing application container images.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 62b84abe-6158-49a0-807f-1c36279dc28a

📥 Commits

Reviewing files that changed from the base of the PR and between a1f80eb and ed181b5.

📒 Files selected for processing (4)
  • src/App/azure-pipelines/deploy-app.yaml
  • src/Designer/backend/src/Designer/Services/Implementation/DeploymentService.cs
  • src/Designer/backend/src/Designer/TypedHttpClients/AzureDevOps/Models/QueueBuildParameters.cs
  • src/Designer/backend/tests/Designer.Tests/Services/DeploymentServiceTest.cs

📝 Walkthrough

Walkthrough

This change introduces a deployment ID identifier that flows from the database-assigned deployment sequence number through the Azure DevOps pipeline queueing parameters and into the Kubernetes pod annotations. The designer service captures the repository-assigned SequenceNo, passes it through build queue parameters, and the pipeline template declares, validates, and applies it as a pod annotation.

Changes

Deployment ID tracking and pipeline integration

Layer / File(s) Summary
QueueBuildParameters deployment ID property
src/Designer/backend/src/Designer/TypedHttpClients/AzureDevOps/Models/QueueBuildParameters.cs
Adds DeploymentId string property with JSON serialisation mapping to DEPLOYMENT_ID and documentation describing it as the logical deployment attempt identifier.
DeploymentService sequence number threading and build queueing
src/Designer/backend/src/Designer/Services/Implementation/DeploymentService.cs
CreateAsync refactors to directly use deploymentEntity.SequenceNo for all DeployEvent records and returns deploymentEntity consistently. QueueDeploymentBuild populates queueBuildParameters.DeploymentId from deploymentEntity.SequenceNo when queuing the build.
Deployment service deployment ID test
src/Designer/backend/tests/Designer.Tests/Services/DeploymentServiceTest.cs
Adds CreateAsync_QueuesPipelineWithDeploymentIdFromCreatedDeployment theory test verifying the mocked repository's assigned SequenceNo is correctly passed as DeploymentId to the queued build parameters.
Pipeline template deployment ID declaration and injection
src/App/azure-pipelines/deploy-app.yaml
Declares DEPLOYMENT_ID variable placeholder, validates it as required when OCI image push is enabled, and injects it into Helm/Flux overrides as podAnnotations.altinn.studio/deployment-id.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

quality/testing, backend, solution/studio/designer

Poem

🐰 A sequence springs forth from the database deep,
Through services it hops, through queues it will leap,
Pod annotations now wear their ID with pride,
From creation to pipeline, the deployment shall ride!

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: force rollout on same-version app redeploy' is partially related to the changeset, focusing on the rollout enforcement aspect but not capturing the core technical implementation of passing deployment IDs through the pipeline.
Description check ✅ Passed The description is comprehensive and complete, covering all required sections: detailed description of changes, clear motivation, verification checklist fully completed with automated test results and manual testing evidence.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added area/app-deploy Area: Related to deploying apps from Altinn Studio to Altinn Apps. quality/testing Tests that are missing, needs to be created or could be improved. skip-releasenotes Issues that do not make sense to list in our release notes backend solution/studio/designer labels May 21, 2026
@martinothamar
Copy link
Copy Markdown
Contributor

Have you manually tested the full path? Example experiment that could be useful

  • create kind cluster, install flux
  • deploy some helm chart through flux that works and starts OK
  • build new image that fails on startup due to some runtime config, for example a configmap value that you can patch later
  • deploy same helm chart with new image (similar to how we do with apps)
  • wait for failure and rollback
  • patch the runtime config so that next time it passes
  • deploy same helm chart with same (new) image as previous

ensure that we actually get the behavior we want:

  • it still rolls back on the failure path
  • it actually redeploys on same-version deploy

are you able to do that?

@martinothamar-agent
Copy link
Copy Markdown
Contributor Author

Yes, I was able to run a local full-path experiment through Flux/Helm in a disposable kind cluster.

What I tested:

  • Created a kind cluster and installed Flux controllers.
  • Served a tiny Helm chart from a local nginx Helm repository reachable inside the kind network.
  • The chart deploys one busybox:1.36 Deployment and reads SHOULD_FAIL from a ConfigMap at runtime.
  • The Deployment pod template includes podAnnotations.altinn.studio/deployment-id, same as this PR relies on.
  • The image tag stayed busybox:1.36 for every deploy. Only values changed.

Test sequence and observed result:

initial deploy: image busybox:1.36, deployment-id=1, config passes
initial annotation: 1
initial ready replicas: 1
initial pod(s): flux-redeploy-test-6c56d4bfbd-4rtvk

failed upgrade: same image tag, deployment-id=2, config fails
post-failure deployment annotation: 1
post-failure ready replicas: 1
post-failure pod(s): flux-redeploy-test-6c56d4bfbd-vf86w

Helm history after the failed same-tag upgrade showed the failed upgrade and rollback:

REVISION  STATUS      DESCRIPTION
1         superseded  Install complete
2         failed      Upgrade "flux-redeploy-test" failed: timeout waiting for: [Deployment/default/flux-redeploy-test status: 'InProgress']
3         superseded  Rollback to 1
4         failed      Upgrade "flux-redeploy-test" failed: timeout waiting for: [Deployment/default/flux-redeploy-test status: 'InProgress']
5         deployed    Rollback to 3

Then I fixed the runtime config and redeployed the same image tag with a new deployment id:

redeploy same image tag after fixing config: deployment-id=3
final annotation: 3
final ready replicas: 1
final image: busybox:1.36
final pod(s): flux-redeploy-test-c6488b746-djfc7

Final Helm history ended with:

6  deployed  Upgrade complete

So the experiment verifies both properties we care about:

  • Flux/Helm still rolled back the failed same-tag upgrade.
  • A later same-version redeploy produced a new pod template via altinn.studio/deployment-id, created a new pod, and became ready while keeping the image tag unchanged.

One useful detail from the first attempt: with the default rolling Deployment strategy, Helm considered the upgrade applied while the new pod was crashing because the old pod was still available. For the final experiment I used strategy.type: Recreate, progressDeadlineSeconds, and explicit Flux wait settings so the startup failure deterministically produced a Helm failure/rollback. That does not change the conclusion for this PR: the rollout trigger is the pod-template annotation, and it works for the same image tag after rollback.

@martinothamar
Copy link
Copy Markdown
Contributor

For the final experiment I used strategy.type: Recreate, progressDeadlineSeconds, and explicit Flux wait settings so the startup failure deterministically produced a Helm failure/rollback.

what do we use for the latest version of the altinn-studio-charts deployment helm chart which the apps use?

@martinothamar-agent
Copy link
Copy Markdown
Contributor Author

Good catch. I checked the actual app path now.

For apps, deploy-app.yaml currently generates a HelmRelease pinned to chart deployment version 3.10.0. Upstream altinn-studio-charts main is also deployment-3.10.0 right now.

For that chart/version:

  • The Deployment template does not set spec.strategy, so Kubernetes uses the default RollingUpdate strategy.
  • The Deployment template does not set spec.progressDeadlineSeconds, so Kubernetes uses the default 600s.
  • The chart has no hook/job like the one I used in the deterministic harness.
  • The generated HelmRelease sets:
spec:
  interval: 5m
  install:
    remediation:
      retries: 1
  upgrade:
    remediation:
      retries: 1
  • The generated HelmRelease does not set spec.timeout, install.timeout, upgrade.timeout, install.disableWait, upgrade.disableWait, or waitStrategy. With Flux defaults this means timeout defaults to 5m, waits are not disabled, and upgrade remediation defaults to rollback.

So my final kind experiment was stricter than the actual app chart on the failure/rollback part: I used Recreate, a short progressDeadlineSeconds, and explicit wait settings to make the rollback deterministic and fast in a local test. The exact current app chart uses default RollingUpdate/default progress deadline/default Flux wait behavior.

The part that is directly representative for this PR is the same-version rollout trigger: podAnnotations.altinn.studio/deployment-id is in Helm values and is rendered into the Deployment pod template. That guarantees a desired-state change for a new logical deployment even when image.tag is unchanged.

Given this, I would phrase the manual verification more narrowly: it verifies the pod-template rollout mechanism under Flux/Helm after a rollback, but it is not an exact reproduction of the current chart's failure detection settings. The observed ttd/fiks-arkiv-test case already had the rollback happen; this PR addresses the next same-version deploy being a real pod-template change instead of a possible no-op.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/app-deploy Area: Related to deploying apps from Altinn Studio to Altinn Apps. backend quality/testing Tests that are missing, needs to be created or could be improved. skip-releasenotes Issues that do not make sense to list in our release notes solution/studio/designer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants