[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

win5923 · 2025-11-10T16:23:51Z

Why are these changes needed?

Currently, when users update a RayCluster spec (e.g., update the image), users must re-create the cluster or manually set spec.suspend to true and after all Pods are deleted and then set it back to false which is not particularly convenient for users deploying with GitOps systems like ArgoCD.

Ref:

[Feature] Identify and apply changes on ray-cluster #2534
[Feature] Add ability to modify the image of a worker group in RayCluster with rolling upgrade or restart #3905

Design doc: https://docs.google.com/document/d/1xQLm0-WQWD-FkufxBJYklOJGvVn4RLk0_vPjLD5ax7o/edit?usp=sharing

Changes

Add spec.upgradeStrategy field to RayCluster CRD
Supports two values:
- Recreate: During upgrade, Recreate strategy will delete all existing pods before creating new ones.
- None: No new pod will be created while the strategy is set to None

Implementation

~~- Store hash of HeadGroupSpec.Template to head pod and workerGroup.Template to worker pod annotations during creation with ray.io/pod-template-hash~~
~~- Compare stored hash with current head pod or worker pod template hash to detect changes and recreate all pods~~

I only compare the HeadGroupSpec.Template and workerGroup.Template because these define the pod-related configurations. The RayCluster.Spec contains many dynamic and component-specific settings (e.g., autoscaler configs, rayStartParams).

Base on #4185 (comment), now we compute a hash from the entire RayCluster.Spec (excluding these fields) and store it as an annotation on the head Pod.

Example:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-kuberay
spec:
  upgradeStrategy:
    type: Recreate
  rayVersion: '2.48.0'

Related issue number

Closes #2534 #3905

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

win5923 · 2025-11-10T17:30:23Z

Hi @andrewsykim, I followed you previous comments to adding a spec.upgradeStrategy API to RayCluster. But for now. I'm concerned this approach may introduce some issues:

Confusion with existing API: We already have upgradeStrategy for RayService. Adding another upgradeStrategy to RayCluster could be confusing for users and creates unclear separation of concerns.
Breaking RayJob workflows: For RayJob, setting upgradeStrategy=Recreate on the RayCluster would cause pod recreation during job execution, leading to job interruption and loss of running jobs.

Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT?

andrewsykim · 2025-11-13T22:27:48Z

Maybe we can just add a feature gate instead of using spec.upgradeStrategy.type field in RayCluster to enable the recreate behavior. WDYT?

Feature gates are used to gate features that are in early development and not ready for wider adoption, it shouldn't be used to change the behavior of RayCluster because it will eventually be on by default (and forced on).

andrewsykim · 2025-11-13T22:29:28Z

I think both of those concerns are valid, but I don't think this is a problem with separation of concerns as RayCluster is a building block for both RayService and RayJob. For those cases you mentioned, we should have validation to ensure RayCluster upgrade strategy cannot be set when used w/ RayJob and RayService

Signed-off-by: win5923 <[email protected]>

Future-Outlier

LGTM, but maybe we can add a follow-up to test scenario when change the ray's image version

cc @rueian to merge, thank you!

I tested this manually between ray version 2.47.0, 2.49.2, and 2.52.0

# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
# 2.47.0, 2.49.2, 2.52.0
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-kuberay
spec:
  upgradeStrategy:
    type: Recreate
  rayVersion: '2.49.2' # should match the Ray version in the image of the containers
  # Ray head pod template
  headGroupSpec:
    # rayStartParams is optional with RayCluster CRD from KubeRay 1.4.0 or later but required in earlier versions.
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.49.2
          resources:
            limits:
              cpu: 2
              memory: 4G
            requests:
              cpu: 1
              memory: 1G
          ports:
          - containerPort: 6379
            name: gcs-server
          - containerPort: 8265 # Ray dashboard
            name: dashboard
          - containerPort: 10001
            name: client
  workerGroupSpecs:
  # the pod replicas in this group typed worker
  - replicas: 1
    minReplicas: 1
    maxReplicas: 5
    # logical group name, for this called small-group, also can be functional
    groupName: workergroup
    # rayStartParams is optional with RayCluster CRD from KubeRay 1.4.0 or later but required in earlier versions.
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
          image: rayproject/ray:2.49.2
          resources:
            limits:
              cpu: 1
              memory: 1G
            requests:
              cpu: 1
              memory: 1G

solved

Future-Outlier · 2025-12-23T12:05:01Z

cc @win5923 to fix the test, maybe I accidentally make the CI fail

Signed-off-by: win5923 <[email protected]>

win5923 · 2025-12-23T15:46:27Z

cc @win5923 to fix the test, maybe I accidentally make the CI fail

Commented out the upgradeStrategy field in the sample yaml because the Test Sample YAMLs (latest release) CI use version 1.5.1.

Signed-off-by: win5923 <[email protected]>

Future-Outlier

cc @andrewsykim to take a look too if you have time

CheyuWu · 2025-12-25T17:01:07Z

ray-operator/controllers/ray/common/pod.go

 	}
 }

+func GeneratePodTemplateHash(template corev1.PodTemplateSpec) (string, error) {


Just curious, is it necessary to have this wrapper function, or is it redundant?

I think this is for better readability.

Do we really consider only the Pod template? I feel that most fields should actually be taken into account, with only a few exceptions in the worker group spec, such as ScaleStrategy, Suspend, Replicas, MinReplicas, MaxReplicas, and IdleTimeoutSeconds.

I don’t have a strong preference since RayCluster is a custom resource.
For me, it’s a 51/49 decision, and I lean toward Rueian’s idea because those fields matter.

Also, in Kubernetes, both Deployments and StatefulSets primarily compare/check the Pod template.

source:

deployment:
https://github.com/kubernetes/kubernetes/blob/1e2817d5890ac5056e770cbdebdadfb7fc6ef54c/pkg/controller/deployment/util/deployment_util.go#L614-L642

statefulset:
https://github.com/kubernetes/kubernetes/blob/46cc610e6fe7e9a933a13d77538b0c220c5414a6/pkg/controller/statefulset/stateful_set_utils.go#L542-L608

Yes, most of the requirements raised by users at the moment are primarily related to image updates, which is why my starting point is at the Pod level rather than the RayCluster level.

#4051
#3905
#2534

But your point makes sense, there are many configurations that should trigger a Pod recreate (like HeadGroupSpec.Resources, RayClusterSpec.AutoscalerOptions). I think we could compare the entire RayClusterSpec directly and exclude certain settings (like rayStartParams, WorkerGroupSpec.Min/Max/Replicas, etc.). WDYT?

I think you can implement Rueian's advice, since RayCluster is a custom resource, therefore we have custom behavior make sense.

and either we support which version, in the future, if we want to change to another version are all breaking change, so I would vote to Rueian's solution.

just talked with @win5923
we can use RayService's solution to achieve this!

Hi @Future-Outlier, @rueian, @CheyuWu, @machichima

I followed the RayService approach to implement the UpgradeStrategy for RayCluster.
Currently, we compute a hash from the entire RayCluster.Spec (excluding these fields) and store it as an annotation on the head Pod.

During reconciliation, we only need to compare the hash on the head Pod to determine whether an upgrade is required. This allows us to avoid re-comparing the spec across all head and worker Pods, simplifying the upgrade detection logic and reducing unnecessary overhead.

Future-Outlier · 2025-12-26T01:56:51Z

cc @rueian to merge, thank you!

Signed-off-by: win5923 <[email protected]>

…ervice's solution Signed-off-by: win5923 <[email protected]>

win5923 · 2025-12-27T13:23:54Z

ray-operator/controllers/ray/raycluster_controller.go

+	// If the KubeRay version has changed, skip recreation to avoid unnecessary pod recreation
+	if len(headPods.Items) == 1 {
+		headPod := headPods.Items[0]
+		podVersion := headPod.Annotations[utils.KubeRayVersion]
+		if podVersion != "" && podVersion != utils.KUBERAY_VERSION {
+			logger.Info("KubeRay version has changed, skipping pod recreation", "rayCluster", instance.Name)
+			return false
+		}
+	}


Following #2320, add a new ray.io/kuberay-version annotation to the head Pod. This annotation is used to detect KubeRay version changes.

When the KUBERAY_VERSION annotation (e.g., 1.5.0) is different from the KubeRay operator's KUBERAY_VERSION (e.g., 1.6.0), should we need to follow the RayService steps?

The ray.io/hash-without-replicas-and-workers-to-delete annotation is updated.

The KUBERAY_VERSION annotation is updated.

After these updates, we can use the new ray.io/hash-without-replicas-and-workers-to-delete annotation to determine whether to trigger a zero downtime upgrade.

I’m asking this because RayService performs an Update on the active RayCluster, which can implicitly trigger a RayCluster recreation to achieve a zero-downtime upgrade. From the user’s perspective, the good thing is they are not required to manually recreate the RayService; the upgrade is handled transparently by the controller.

kuberay/ray-operator/controllers/ray/rayservice_controller.go

Lines 904 to 917 in dea5baa

if shouldUpdateCluster(rayServiceInstance, activeRayCluster, true) {

// TODO(kevin85421): We should not reconstruct the cluster to update it. This will cause issues if autoscaler is enabled.

logger.Info("Updating the active RayCluster instance", "clusterName", activeRayCluster.Name)

goalCluster, err := constructRayClusterForRayService(rayServiceInstance, activeRayCluster.Name, r.Scheme)

if err != nil {

return nil, nil, err

}

modifyRayCluster(ctx, activeRayCluster, goalCluster)

if err = r.Update(ctx, activeRayCluster); err != nil {

r.Recorder.Eventf(rayServiceInstance, corev1.EventTypeWarning, string(utils.FailedToUpdateRayCluster), "Failed to update the active RayCluster %s/%s: %v", activeRayCluster.Namespace, activeRayCluster.Name, err)

return activeRayCluster, pendingRayCluster, err

}

r.Recorder.Eventf(rayServiceInstance, corev1.EventTypeNormal, string(utils.UpdatedRayCluster), "Updated the active RayCluster %s/%s", activeRayCluster.Namespace, activeRayCluster.Name)

}

However, since RayCluster does not currently support zero-downtime upgrades, my approach is to avoid updating Pods when the KUBERAY_VERSION is different.

As a result, when the KubeRay operator is upgraded and a version mismatch is detected, the controller will not automatically trigger a RayCluster upgrade. Instead, users are required to manually delete and re-apply the RayCluster after upgrading the operator.

Maybe we can simply just update the pod annotations?

// If the KubeRay version has changed, skip recreation to avoid unnecessary pod recreation if len(headPods.Items) == 1 { headPod := headPods.Items[0] podVersion := headPod.Annotations[utils.KubeRayVersion] if podVersion != "" && podVersion != utils.KUBERAY_VERSION { logger.Info("KubeRay version has changed, skipping pod recreation", "rayCluster", instance.Name) clusterHash, err := utils.GenerateHashWithoutReplicasAndWorkersToDelete(instance.Spec) if err != nil { logger.Error(err, "Failed to generate cluster spec hash for Recreate upgradeStrategy, skipping comparison") return false } headPod.Annotations[utils.HashWithoutReplicasAndWorkersToDeleteKey] = clusterHash headPod.Annotations[utils.KubeRayVersion] = utils.KUBERAY_VERSION if err := r.Update(ctx, &headPod); err != nil { logger.Error(err, "Failed to update head pod annotations after KubeRay version change", "pod", headPod.Name) } return false } }

Future-Outlier

Hi, @win5923
Since RayService store the hash in the RayCluster's CR, can we store the hash in the RayCluster's CR instead of head pod?

update

win5923 · 2025-12-28T13:40:03Z

Hi @Future-Outlier
The reason I'm concerned about storing the hash in RayCluster CR annotations is:

We need to call r.Update() in multiple locations (after pod deletion, on version change, etc.). I think Controllers should primarily reconcile the desired state into the actual state, not constantly update their own CR.
Every r.Update(ctx, instance) triggers a new reconciliation, wasting resources and creating potential reconciliation loops.

RayService manages RayCluster CRs and already needs to Create/Update them, so setting the hash is free. But RayCluster manages Pods directly and doesn't normally update its own CR. If we store the hash in the RayCluster, this will make the hash update an "extra expensive operation".

In contrast, storing the hash in Pod annotations is simpler: we write it once during pod creation, and it doesn't trigger reconciliation loops.

Signed-off-by: win5923 <[email protected]>

Signed-off-by: Future-Outlier <[email protected]>

Future-Outlier · 2025-12-29T04:56:21Z

Hi @Future-Outlier The reason I'm concerned about storing the hash in RayCluster CR annotations is:

We need to call r.Update() in multiple locations (after pod deletion, on version change, etc.). I think Controllers should primarily reconcile the desired state into the actual state, not constantly update their own CR.

Every r.Update(ctx, instance) triggers a new reconciliation, wasting resources and creating potential reconciliation loops.

RayService manages RayCluster CRs and already needs to Create/Update them, so setting the hash is free. But RayCluster manages Pods directly and doesn't normally update its own CR. If we store the hash in the RayCluster, this will make the hash update an "extra expensive operation".

In contrast, storing the hash in Pod annotations is simpler: we write it once during pod creation, and it doesn't trigger reconciliation loops.

make sense to me, thank you!
and cc @rueian to take a look

Future-Outlier · 2025-12-29T05:42:39Z

ray-operator/controllers/ray/raycluster_controller.go

+// shouldRecreatePodsForUpgrade checks if any pods need to be recreated based on RayClusterSpec changes
+func (r *RayClusterReconciler) shouldRecreatePodsForUpgrade(ctx context.Context, instance *rayv1.RayCluster) bool {
+	logger := ctrl.LoggerFrom(ctx)
+
+	if instance.Spec.UpgradeStrategy == nil || instance.Spec.UpgradeStrategy.Type == nil || *instance.Spec.UpgradeStrategy.Type != rayv1.RayClusterRecreate {
+		return false
+	}
+
+	headPods := corev1.PodList{}
+	if err := r.List(ctx, &headPods, common.RayClusterHeadPodsAssociationOptions(instance).ToListOptions()...); err != nil {
+		logger.Error(err, "Failed to list head pods for upgrade check")
+		return false
+	}
+
+	// If the KubeRay version has changed, skip recreation to avoid unnecessary pod recreation
+	if len(headPods.Items) == 1 {
+		headPod := headPods.Items[0]
+		podVersion := headPod.Annotations[utils.KubeRayVersion]
+		if podVersion != "" && podVersion != utils.KUBERAY_VERSION {
+			logger.Info("KubeRay version has changed, skipping pod recreation", "rayCluster", instance.Name)
+			return false
+		}
+	}
+
+	expectedClusterHash, err := utils.GenerateHashWithoutReplicasAndWorkersToDelete(instance.Spec)
+	if err != nil {
+		logger.Error(err, "Failed to generate cluster spec hash for Recreate upgradeStrategy, skipping comparison")
+		return false
+	}
+
+	if len(headPods.Items) == 1 {
+		headPod := headPods.Items[0]
+		actualHash := headPod.Annotations[utils.HashWithoutReplicasAndWorkersToDeleteKey]
+		if actualHash != "" && actualHash != expectedClusterHash {
+			logger.Info("RayCluster spec has changed, will recreate all pods", "rayCluster", instance.Name)
+			return true
+		}
+	}
+
+	return false
+}


Suggested change

// shouldRecreatePodsForUpgrade checks if any pods need to be recreated based on RayClusterSpec changes

func (r *RayClusterReconciler) shouldRecreatePodsForUpgrade(ctx context.Context, instance *rayv1.RayCluster) bool {

logger := ctrl.LoggerFrom(ctx)

if instance.Spec.UpgradeStrategy == nil || instance.Spec.UpgradeStrategy.Type == nil || *instance.Spec.UpgradeStrategy.Type != rayv1.RayClusterRecreate {

return false

}

headPods := corev1.PodList{}

if err := r.List(ctx, &headPods, common.RayClusterHeadPodsAssociationOptions(instance).ToListOptions()...); err != nil {

logger.Error(err, "Failed to list head pods for upgrade check")

return false

}

// If the KubeRay version has changed, skip recreation to avoid unnecessary pod recreation

if len(headPods.Items) == 1 {

headPod := headPods.Items[0]

podVersion := headPod.Annotations[utils.KubeRayVersion]

if podVersion != "" && podVersion != utils.KUBERAY_VERSION {

logger.Info("KubeRay version has changed, skipping pod recreation", "rayCluster", instance.Name)

return false

}

}

expectedClusterHash, err := utils.GenerateHashWithoutReplicasAndWorkersToDelete(instance.Spec)

if err != nil {

logger.Error(err, "Failed to generate cluster spec hash for Recreate upgradeStrategy, skipping comparison")

return false

}

if len(headPods.Items) == 1 {

headPod := headPods.Items[0]

actualHash := headPod.Annotations[utils.HashWithoutReplicasAndWorkersToDeleteKey]

if actualHash != "" && actualHash != expectedClusterHash {

logger.Info("RayCluster spec has changed, will recreate all pods", "rayCluster", instance.Name)

return true

}

}

return false

}

// shouldRecreatePodsForUpgrade checks if any pods need to be recreated based on RayClusterSpec changes

func (r *RayClusterReconciler) shouldRecreatePodsForUpgrade(ctx context.Context, instance *rayv1.RayCluster) bool {

logger := ctrl.LoggerFrom(ctx)

if instance.Spec.UpgradeStrategy == nil || instance.Spec.UpgradeStrategy.Type == nil || *instance.Spec.UpgradeStrategy.Type != rayv1.RayClusterRecreate {

return false

}

expectedClusterHash, err := utils.GenerateHashWithoutReplicasAndWorkersToDelete(instance.Spec)

if err != nil {

logger.Error(err, "Failed to generate cluster spec hash for Recreate upgradeStrategy, skipping comparison")

return false

}

headPods := corev1.PodList{}

if err := r.List(ctx, &headPods, common.RayClusterHeadPodsAssociationOptions(instance).ToListOptions()...); err != nil {

logger.Error(err, "Failed to list head pods for upgrade check")

return false

}

// If the KubeRay version has changed, skip recreation to avoid unnecessary pod recreation

if len(headPods.Items) == 1 {

headPod := headPods.Items[0]

podVersion := headPod.Annotations[utils.KubeRayVersion]

if podVersion != "" && podVersion != utils.KUBERAY_VERSION {

logger.Info("KubeRay version has changed, skipping pod recreation", "rayCluster", instance.Name)

return false

}

actualHash := headPod.Annotations[utils.HashWithoutReplicasAndWorkersToDeleteKey]

if actualHash != "" && actualHash != expectedClusterHash {

logger.Info("RayCluster spec has changed, will recreate all pods", "rayCluster", instance.Name)

return true

}

}

return false

}

maybe len(headPods.Items) > 0 is better than len(headPods.Items) == 1

Future-Outlier · 2025-12-29T07:50:22Z

Hi @Future-Outlier The reason I'm concerned about storing the hash in RayCluster CR annotations is:

We need to call r.Update() in multiple locations (after pod deletion, on version change, etc.). I think Controllers should primarily reconcile the desired state into the actual state, not constantly update their own CR.

Every r.Update(ctx, instance) triggers a new reconciliation, wasting resources and creating potential reconciliation loops.

RayService manages RayCluster CRs and already needs to Create/Update them, so setting the hash is free. But RayCluster manages Pods directly and doesn't normally update its own CR. If we store the hash in the RayCluster, this will make the hash update an "extra expensive operation".

In contrast, storing the hash in Pod annotations is simpler: we write it once during pod creation, and it doesn't trigger reconciliation loops.

cc @andrewsykim to take a look, thank you!

CheyuWu

LGTM

win5923 requested review from MortalHappiness, andrewsykim, kevin85421 and rueian as code owners November 10, 2025 16:23

win5923 marked this pull request as draft November 10, 2025 16:24

win5923 force-pushed the raycluster-upgradeStrategy branch 6 times, most recently from 710166a to d261b0b Compare November 10, 2025 17:11

win5923 changed the title ~~[draft] Support recreate pods for RayCluster using RayClusterSpec~~ [draft] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy Nov 10, 2025

win5923 force-pushed the raycluster-upgradeStrategy branch 7 times, most recently from 05b8108 to 7109cf1 Compare November 19, 2025 17:27

win5923 changed the title ~~[draft] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy~~ [Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy Nov 19, 2025

win5923 force-pushed the raycluster-upgradeStrategy branch 2 times, most recently from 3d448e6 to 8bcce91 Compare November 19, 2025 18:26

[Feature] Support recreate pods for RayCluster using RayClusterSpec

bf87764

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from 8bcce91 to bf87764 Compare November 19, 2025 18:28

win5923 marked this pull request as ready for review November 19, 2025 18:30

win5923 force-pushed the raycluster-upgradeStrategy branch 2 times, most recently from c9d35b2 to 8d4c813 Compare November 20, 2025 17:03

Future-Outlier previously approved these changes Dec 23, 2025

View reviewed changes

Commented out upgradeStrategy for sample yaml

f2854fc

Signed-off-by: win5923 <[email protected]>

Update container image for TestRayClusterUpgradeStrategy test

54db00f

Signed-off-by: win5923 <[email protected]>

Future-Outlier reviewed Dec 23, 2025

View reviewed changes

Future-Outlier moved this from In Progress to can be merged in @Future-Outlier's kuberay project Dec 24, 2025

machichima approved these changes Dec 25, 2025

View reviewed changes

CheyuWu reviewed Dec 25, 2025

View reviewed changes

win5923 added 2 commits December 27, 2025 12:33

Compare RayClusterSpec

bf27916

Signed-off-by: win5923 <[email protected]>

Merge branch 'upstream-master' into raycluster-upgradeStrategy

ee549d4

win5923 force-pushed the raycluster-upgradeStrategy branch from 3ef2fc8 to 40376cd Compare December 27, 2025 12:58

Remove WorkerGroupSpecs.IdleTimeoutSeconds and Suspend to follow RayS…

4f7c460

…ervice's solution Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from 40376cd to 4f7c460 Compare December 27, 2025 13:03

win5923 commented Dec 27, 2025

View reviewed changes

win5923 force-pushed the raycluster-upgradeStrategy branch from 83c82bf to 643d7e7 Compare December 27, 2025 14:38

Future-Outlier reviewed Dec 28, 2025

View reviewed changes

Future-Outlier self-requested a review December 28, 2025 03:22

Follow RayService's solution

fe87a41

Signed-off-by: win5923 <[email protected]>

win5923 force-pushed the raycluster-upgradeStrategy branch from 643d7e7 to fe87a41 Compare December 28, 2025 13:44

Trigger CI

9974a0a

Signed-off-by: Future-Outlier <[email protected]>

Future-Outlier approved these changes Dec 29, 2025

View reviewed changes

Future-Outlier reviewed Dec 29, 2025

View reviewed changes

CheyuWu approved these changes Dec 29, 2025

View reviewed changes

	if shouldUpdateCluster(rayServiceInstance, activeRayCluster, true) {
	// TODO(kevin85421): We should not reconstruct the cluster to update it. This will cause issues if autoscaler is enabled.
	logger.Info("Updating the active RayCluster instance", "clusterName", activeRayCluster.Name)
	goalCluster, err := constructRayClusterForRayService(rayServiceInstance, activeRayCluster.Name, r.Scheme)
	if err != nil {
	return nil, nil, err
	}
	modifyRayCluster(ctx, activeRayCluster, goalCluster)
	if err = r.Update(ctx, activeRayCluster); err != nil {
	r.Recorder.Eventf(rayServiceInstance, corev1.EventTypeWarning, string(utils.FailedToUpdateRayCluster), "Failed to update the active RayCluster %s/%s: %v", activeRayCluster.Namespace, activeRayCluster.Name, err)
	return activeRayCluster, pendingRayCluster, err
	}
	r.Recorder.Eventf(rayServiceInstance, corev1.EventTypeNormal, string(utils.UpdatedRayCluster), "Updated the active RayCluster %s/%s", activeRayCluster.Namespace, activeRayCluster.Name)
	}

[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

Are you sure you want to change the base?

[Feature] Support recreate pods for RayCluster using RayClusterSpec.upgradeStrategy #4185

Conversation

win5923 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Changes

Implementation

Example:

Related issue number

Checks

Uh oh!

win5923 commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewsykim commented Nov 13, 2025

Uh oh!

andrewsykim commented Nov 13, 2025

Uh oh!

Future-Outlier left a comment

Choose a reason for hiding this comment

Uh oh!

Future-Outlier commented Dec 23, 2025

Uh oh!

win5923 commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Future-Outlier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier commented Dec 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Future-Outlier left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Future-Outlier commented Dec 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

win5923 commented Nov 10, 2025 •

edited

Loading

win5923 commented Nov 10, 2025 •

edited

Loading

win5923 commented Dec 23, 2025 •

edited

Loading

Future-Outlier Dec 26, 2025 •

edited

Loading

win5923 Dec 26, 2025 •

edited

Loading

Future-Outlier Dec 26, 2025 •

edited

Loading

Future-Outlier Dec 26, 2025 •

edited

Loading

win5923 Dec 27, 2025 •

edited

Loading

win5923 Dec 27, 2025 •

edited

Loading

Future-Outlier left a comment •

edited

Loading

win5923 commented Dec 28, 2025 •

edited

Loading