[Serve] log deployment config in controller logs by KeeProMise · Pull Request #59222 · ray-project/ray

KeeProMise · 2025-12-06T15:16:27Z

Description

[Serve] log deployment config in controller logs

Related issues

#59167

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

gemini-code-assist

Code Review

This pull request introduces logging for deployment configurations in Ray Serve, which is a valuable addition for debugging. The new _log_deployment_configs function logs declarative, imperative, and merged configurations. My review focuses on improving the implementation of this new function for better maintainability and readability by reducing code duplication.

gemini-code-assist · 2025-12-06T15:18:15Z

python/ray/serve/_private/application_state.py

+def _log_deployment_configs(
+    app_name: str,
+    declarative_config: ServeApplicationSchema,
+    imperative_configs: Dict[str, DeploymentInfo],
+    merged_configs: Dict[str, DeploymentInfo],
+) -> None:
+    """Log the declarative, imperative, and merged deployment configs.
+
+    This function logs the deployment configuration after each update to help
+    with debugging and understanding how declarative (from YAML/config) and
+    imperative (from code) configurations are merged.
+
+    Args:
+        app_name: Name of the application.
+        declarative_config: The declarative config from the YAML/config file.
+        imperative_configs: The imperative configs from the application code.
+        merged_configs: The final merged configs after applying overrides.
+    """
+    try:
+        # Format declarative config
+        declarative_dict = {}
+        if declarative_config.deployments:
+            for dep in declarative_config.deployments:
+                dep_dict = dep.dict(exclude_unset=True) if hasattr(dep, "dict") else {}
+                declarative_dict[dep.name] = dep_dict
+
+        # Format imperative configs
+        imperative_dict = {}
+        for dep_name, dep_info in imperative_configs.items():
+            imperative_dict[dep_name] = {
+                "deployment_config": dep_info.deployment_config.dict(),
+                "replica_config": {
+                    "ray_actor_options": dep_info.replica_config.ray_actor_options,
+                    "placement_group_bundles": dep_info.replica_config.placement_group_bundles,
+                    "placement_group_strategy": dep_info.replica_config.placement_group_strategy,
+                    "max_replicas_per_node": dep_info.replica_config.max_replicas_per_node,
+                },
+                "route_prefix": dep_info.route_prefix,
+                "version": dep_info.version,
+            }
+
+        # Format merged configs
+        merged_dict = {}
+        for dep_name, dep_info in merged_configs.items():
+            merged_dict[dep_name] = {
+                "deployment_config": dep_info.deployment_config.dict(),
+                "replica_config": {
+                    "ray_actor_options": dep_info.replica_config.ray_actor_options,
+                    "placement_group_bundles": dep_info.replica_config.placement_group_bundles,
+                    "placement_group_strategy": dep_info.replica_config.placement_group_strategy,
+                    "max_replicas_per_node": dep_info.replica_config.max_replicas_per_node,
+                },
+                "route_prefix": dep_info.route_prefix,
+                "version": dep_info.version,
+            }
+
+        # Log the configurations
+        logger.info(
+            f"Application '{app_name}' deployment configs after update:\n"
+            f"Declarative config (from YAML/config): {json.dumps(declarative_dict, indent=2, default=str)}\n"
+            f"Imperative config (from code): {json.dumps(imperative_dict, indent=2, default=str)}\n"
+            f"Merged config (final): {json.dumps(merged_dict, indent=2, default=str)}"
+        )
+    except Exception as e:
+        # Don't fail the deployment if logging fails
+        logger.warning(
+            f"Failed to log deployment configs for application '{app_name}': {e}"
+        )


The _log_deployment_configs function can be refactored for better readability and to reduce code duplication.

The logic for formatting imperative_configs and merged_configs is identical. This can be extracted into a nested helper function, _format_deployment_infos, to avoid repetition.

The creation of declarative_dict can be simplified using a dictionary comprehension. The hasattr(dep, "dict") check is also redundant because declarative_config.deployments is a list of DeploymentSchema pydantic models, which are guaranteed to have a .dict() method.

Here is the suggested refactoring:

def _log_deployment_configs( app_name: str, declarative_config: ServeApplicationSchema, imperative_configs: Dict[str, DeploymentInfo], merged_configs: Dict[str, DeploymentInfo], ) -> None: """Log the declarative, imperative, and merged deployment configs. This function logs the deployment configuration after each update to help with debugging and understanding how declarative (from YAML/config) and imperative (from code) configurations are merged. Args: app_name: Name of the application. declarative_config: The declarative config from the YAML/config file. imperative_configs: The imperative configs from the application code. merged_configs: The final merged configs after applying overrides. """ try: def _format_deployment_infos(infos: Dict[str, DeploymentInfo]) -> Dict[str, Dict]: """Helper to format DeploymentInfo objects for logging.""" return { dep_name: { "deployment_config": dep_info.deployment_config.dict(), "replica_config": { "ray_actor_options": dep_info.replica_config.ray_actor_options, "placement_group_bundles": dep_info.replica_config.placement_group_bundles, "placement_group_strategy": dep_info.replica_config.placement_group_strategy, "max_replicas_per_node": dep_info.replica_config.max_replicas_per_node, }, "route_prefix": dep_info.route_prefix, "version": dep_info.version, } for dep_name, dep_info in infos.items() } # Format declarative config declarative_dict = { dep.name: dep.dict(exclude_unset=True) for dep in declarative_config.deployments } if declarative_config.deployments else {} # Format imperative and merged configs imperative_dict = _format_deployment_infos(imperative_configs) merged_dict = _format_deployment_infos(merged_configs) # Log the configurations logger.info( f"Application '{app_name}' deployment configs after update:\n" f"Declarative config (from YAML/config): {json.dumps(declarative_dict, indent=2, default=str)}\n" f"Imperative config (from code): {json.dumps(imperative_dict, indent=2, default=str)}\n" f"Merged config (final): {json.dumps(merged_dict, indent=2, default=str)}" ) except Exception as e: # Don't fail the deployment if logging fails logger.warning( f"Failed to log deployment configs for application '{app_name}': {e}" )

python/ray/serve/_private/application_state.py

abrarsheikh

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

KeeProMise · 2025-12-09T03:00:19Z

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations.
Architectural reasons:
ReplicaConfig contains non-JSON-serializable byte fields:
serialized_deployment_def: bytes - serialized deployment code
serialized_init_args: bytes - serialized initialization arguments
serialized_init_kwargs: bytes - serialized initialization keyword arguments
These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging.
Why DeploymentInfo cannot be serialized directly:
If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data.
Current approach:
We only extract and log serializable configuration fields:
deployment_config (Pydantic model, directly serializable)
Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.)
route_prefix and version

abrarsheikh · 2025-12-09T20:29:01Z

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

Signed-off-by: JianZhang <keepromise@apache.org>

KeeProMise · 2025-12-10T03:32:41Z

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

Hi @abrarsheikh Thank you for taking the time to review. PTAL.

python/ray/serve/_private/deployment_info.py

abrarsheikh · 2025-12-11T01:48:28Z

tests are failing.

Signed-off-by: JianZhang <keepromise@apache.org>

python/ray/serve/_private/deployment_info.py

Signed-off-by: JianZhang <keepromise@apache.org>

abrarsheikh · 2025-12-12T19:14:55Z

python/ray/serve/_private/application_state.py

+        # Log the resulting declarative + imperative version of the deployment config
+        # after each update to the application
+        if deployment_infos is not None and not deleting:
+            deployment_configs = {
+                name: info.to_dict() for name, info in deployment_infos.items()
+            }
+            logger.info(
+                f"Application '{self._name}' updated. Deployment configs (declarative + imperative): "
+                f"{json.dumps(deployment_configs, indent=2, default=str)}"
+            )


i think the log in deploy is sufficient. this one is going to be too noisy since it get called on every autoscaling decision.

Signed-off-by: JianZhang <keepromise@apache.org>

harshit-anyscale

LGTM

abrarsheikh · 2025-12-15T17:14:48Z

please add screenshot of manual test after this change.

KeeProMise · 2025-12-16T07:10:19Z

please add screenshot of manual test after this change.

## Description [Serve] log deployment config in controller logs ## Related issues #59167 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: JianZhang <keepromise@apache.org>

KeeProMise · 2025-12-17T02:24:51Z

@abrarsheikh Thank you for reviewing and merging this PR.

abrarsheikh · 2025-12-17T05:48:25Z

@KeeProMise message me on slack if you are interested in making other contributions :)

## Description [Serve] log deployment config in controller logs ## Related issues ray-project#59167 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: JianZhang <keepromise@apache.org>

## Description [Serve] log deployment config in controller logs ## Related issues ray-project#59167 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: JianZhang <keepromise@apache.org> Signed-off-by: peterxcli <peterxcli@gmail.com>

KeeProMise requested a review from a team as a code owner December 6, 2025 15:16

gemini-code-assist bot reviewed Dec 6, 2025

View reviewed changes

cursor bot reviewed Dec 6, 2025

View reviewed changes

python/ray/serve/_private/application_state.py Outdated Show resolved Hide resolved

ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Dec 6, 2025

abrarsheikh reviewed Dec 8, 2025

View reviewed changes

[Serve] log deployment config in controller logs

cd7cf8b

Signed-off-by: JianZhang <keepromise@apache.org>

KeeProMise force-pushed the 59167 branch from 891e46b to cd7cf8b Compare December 10, 2025 03:31

cursor bot reviewed Dec 10, 2025

View reviewed changes

python/ray/serve/_private/deployment_info.py Show resolved Hide resolved

pre-commit

230d1e8

Signed-off-by: JianZhang <keepromise@apache.org>

cursor bot reviewed Dec 11, 2025

View reviewed changes

python/ray/serve/_private/deployment_info.py Outdated Show resolved Hide resolved

fix ut

c784253

Signed-off-by: JianZhang <keepromise@apache.org>

KeeProMise force-pushed the 59167 branch from 04aed99 to c784253 Compare December 12, 2025 10:13

abrarsheikh reviewed Dec 12, 2025

View reviewed changes

revert appliction state

2c64485

Signed-off-by: JianZhang <keepromise@apache.org>

harshit-anyscale added the go add ONLY when ready to merge, run all tests label Dec 15, 2025

harshit-anyscale approved these changes Dec 15, 2025

View reviewed changes

abrarsheikh approved these changes Dec 16, 2025

View reviewed changes

abrarsheikh merged commit ee57dea into ray-project:master Dec 16, 2025
7 checks passed

Conversation

KeeProMise commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abrarsheikh left a comment

Choose a reason for hiding this comment

Uh oh!

KeeProMise commented Dec 9, 2025

Uh oh!

abrarsheikh commented Dec 9, 2025

Uh oh!

KeeProMise commented Dec 10, 2025

Uh oh!

Uh oh!

abrarsheikh commented Dec 11, 2025

Uh oh!

Uh oh!

abrarsheikh Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

harshit-anyscale left a comment

Choose a reason for hiding this comment

Uh oh!

abrarsheikh commented Dec 15, 2025

Uh oh!

KeeProMise commented Dec 16, 2025

Uh oh!

Uh oh!

KeeProMise commented Dec 17, 2025

Uh oh!

abrarsheikh commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KeeProMise commented Dec 6, 2025 •

edited

Loading