Skip to content

[Serve] log deployment config in controller logs#59222

Merged
abrarsheikh merged 4 commits intoray-project:masterfrom
KeeProMise:59167
Dec 16, 2025
Merged

[Serve] log deployment config in controller logs#59222
abrarsheikh merged 4 commits intoray-project:masterfrom
KeeProMise:59167

Conversation

@KeeProMise
Copy link
Contributor

@KeeProMise KeeProMise commented Dec 6, 2025

Description

[Serve] log deployment config in controller logs

Related issues

#59167

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@KeeProMise KeeProMise requested a review from a team as a code owner December 6, 2025 15:16
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logging for deployment configurations in Ray Serve, which is a valuable addition for debugging. The new _log_deployment_configs function logs declarative, imperative, and merged configurations. My review focuses on improving the implementation of this new function for better maintainability and readability by reducing code duplication.

Comment on lines 1770 to 1837
def _log_deployment_configs(
app_name: str,
declarative_config: ServeApplicationSchema,
imperative_configs: Dict[str, DeploymentInfo],
merged_configs: Dict[str, DeploymentInfo],
) -> None:
"""Log the declarative, imperative, and merged deployment configs.

This function logs the deployment configuration after each update to help
with debugging and understanding how declarative (from YAML/config) and
imperative (from code) configurations are merged.

Args:
app_name: Name of the application.
declarative_config: The declarative config from the YAML/config file.
imperative_configs: The imperative configs from the application code.
merged_configs: The final merged configs after applying overrides.
"""
try:
# Format declarative config
declarative_dict = {}
if declarative_config.deployments:
for dep in declarative_config.deployments:
dep_dict = dep.dict(exclude_unset=True) if hasattr(dep, "dict") else {}
declarative_dict[dep.name] = dep_dict

# Format imperative configs
imperative_dict = {}
for dep_name, dep_info in imperative_configs.items():
imperative_dict[dep_name] = {
"deployment_config": dep_info.deployment_config.dict(),
"replica_config": {
"ray_actor_options": dep_info.replica_config.ray_actor_options,
"placement_group_bundles": dep_info.replica_config.placement_group_bundles,
"placement_group_strategy": dep_info.replica_config.placement_group_strategy,
"max_replicas_per_node": dep_info.replica_config.max_replicas_per_node,
},
"route_prefix": dep_info.route_prefix,
"version": dep_info.version,
}

# Format merged configs
merged_dict = {}
for dep_name, dep_info in merged_configs.items():
merged_dict[dep_name] = {
"deployment_config": dep_info.deployment_config.dict(),
"replica_config": {
"ray_actor_options": dep_info.replica_config.ray_actor_options,
"placement_group_bundles": dep_info.replica_config.placement_group_bundles,
"placement_group_strategy": dep_info.replica_config.placement_group_strategy,
"max_replicas_per_node": dep_info.replica_config.max_replicas_per_node,
},
"route_prefix": dep_info.route_prefix,
"version": dep_info.version,
}

# Log the configurations
logger.info(
f"Application '{app_name}' deployment configs after update:\n"
f"Declarative config (from YAML/config): {json.dumps(declarative_dict, indent=2, default=str)}\n"
f"Imperative config (from code): {json.dumps(imperative_dict, indent=2, default=str)}\n"
f"Merged config (final): {json.dumps(merged_dict, indent=2, default=str)}"
)
except Exception as e:
# Don't fail the deployment if logging fails
logger.warning(
f"Failed to log deployment configs for application '{app_name}': {e}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _log_deployment_configs function can be refactored for better readability and to reduce code duplication.

  1. The logic for formatting imperative_configs and merged_configs is identical. This can be extracted into a nested helper function, _format_deployment_infos, to avoid repetition.
  2. The creation of declarative_dict can be simplified using a dictionary comprehension. The hasattr(dep, "dict") check is also redundant because declarative_config.deployments is a list of DeploymentSchema pydantic models, which are guaranteed to have a .dict() method.

Here is the suggested refactoring:

def _log_deployment_configs(
    app_name: str,
    declarative_config: ServeApplicationSchema,
    imperative_configs: Dict[str, DeploymentInfo],
    merged_configs: Dict[str, DeploymentInfo],
) -> None:
    """Log the declarative, imperative, and merged deployment configs.
    
    This function logs the deployment configuration after each update to help
    with debugging and understanding how declarative (from YAML/config) and
    imperative (from code) configurations are merged.
    
    Args:
        app_name: Name of the application.
        declarative_config: The declarative config from the YAML/config file.
        imperative_configs: The imperative configs from the application code.
        merged_configs: The final merged configs after applying overrides.
    """
    try:
        def _format_deployment_infos(infos: Dict[str, DeploymentInfo]) -> Dict[str, Dict]:
            """Helper to format DeploymentInfo objects for logging."""
            return {
                dep_name: {
                    "deployment_config": dep_info.deployment_config.dict(),
                    "replica_config": {
                        "ray_actor_options": dep_info.replica_config.ray_actor_options,
                        "placement_group_bundles": dep_info.replica_config.placement_group_bundles,
                        "placement_group_strategy": dep_info.replica_config.placement_group_strategy,
                        "max_replicas_per_node": dep_info.replica_config.max_replicas_per_node,
                    },
                    "route_prefix": dep_info.route_prefix,
                    "version": dep_info.version,
                }
                for dep_name, dep_info in infos.items()
            }

        # Format declarative config
        declarative_dict = {
            dep.name: dep.dict(exclude_unset=True)
            for dep in declarative_config.deployments
        } if declarative_config.deployments else {}
        
        # Format imperative and merged configs
        imperative_dict = _format_deployment_infos(imperative_configs)
        merged_dict = _format_deployment_infos(merged_configs)
        
        # Log the configurations
        logger.info(
            f"Application '{app_name}' deployment configs after update:\n"
            f"Declarative config (from YAML/config): {json.dumps(declarative_dict, indent=2, default=str)}\n"
            f"Imperative config (from code): {json.dumps(imperative_dict, indent=2, default=str)}\n"
            f"Merged config (final): {json.dumps(merged_dict, indent=2, default=str)}"
        )
    except Exception as e:
        # Don't fail the deployment if logging fails
        logger.warning(
            f"Failed to log deployment configs for application '{app_name}': {e}"
        )

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling community-contribution Contributed by the community labels Dec 6, 2025
Copy link
Contributor

@abrarsheikh abrarsheikh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

@KeeProMise
Copy link
Contributor Author

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations.
Architectural reasons:
ReplicaConfig contains non-JSON-serializable byte fields:
serialized_deployment_def: bytes - serialized deployment code
serialized_init_args: bytes - serialized initialization arguments
serialized_init_kwargs: bytes - serialized initialization keyword arguments
These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging.
Why DeploymentInfo cannot be serialized directly:
If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data.
Current approach:
We only extract and log serializable configuration fields:
deployment_config (Pydantic model, directly serializable)
Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.)
route_prefix and version

@abrarsheikh
Copy link
Contributor

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

Signed-off-by: JianZhang <keepromise@apache.org>
@KeeProMise
Copy link
Contributor Author

These changes seem overly complex for a simple task like logging config. Is there a architectural reason why logging just the deployment config is difficult?

Thanks for the feedback. Indeed it looks complex, but this is caused by architectural limitations. Architectural reasons: ReplicaConfig contains non-JSON-serializable byte fields: serialized_deployment_def: bytes - serialized deployment code serialized_init_args: bytes - serialized initialization arguments serialized_init_kwargs: bytes - serialized initialization keyword arguments These fields are used for transmitting code in Ray clusters and cannot be directly serialized to JSON for logging. Why DeploymentInfo cannot be serialized directly: If calling dict() or similar methods directly on DeploymentInfo, these byte fields in ReplicaConfig would cause JSON serialization to fail, or produce large amounts of meaningless binary data. Current approach: We only extract and log serializable configuration fields: deployment_config (Pydantic model, directly serializable) Serializable fields in replica_config (ray_actor_options, placement_group_bundles, etc.) route_prefix and version

right, i think you are capturing the important detail in your comment. I am wondering why the change is not as simple as

diff --git a/python/ray/serve/_private/config.py b/python/ray/serve/_private/config.py
index 22546f6188..efc8e6c468 100644
--- a/python/ray/serve/_private/config.py
+++ b/python/ray/serve/_private/config.py
@@ -806,7 +806,16 @@ class ReplicaConfig:

     def to_proto_bytes(self):
         return self.to_proto().SerializeToString()
-
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_def_name": self.deployment_def_name,
+            "ray_actor_options": json.loads(self.ray_actor_options),
+            "placement_group_bundles": json.loads(self.placement_group_bundles),
+            "placement_group_strategy": self.placement_group_strategy,
+            "max_replicas_per_node": self.max_replicas_per_node,
+        }

 def prepare_imperative_http_options(
     proxy_location: Union[None, str, ProxyLocation],
diff --git a/python/ray/serve/_private/deployment_info.py b/python/ray/serve/_private/deployment_info.py
index 5413c7878a..1e4e824368 100644
--- a/python/ray/serve/_private/deployment_info.py
+++ b/python/ray/serve/_private/deployment_info.py
@@ -167,3 +167,14 @@ class DeploymentInfo:
         else:
             data["target_capacity_direction"] = self.target_capacity_direction.name
         return DeploymentInfoProto(**data)
+
+    def to_dict(self):
+        # only use for logging purposes
+        return {
+            "deployment_config": self.deployment_config.model_dump(),
+            "replica_config": self.replica_config.to_dict(),
+            "start_time_ms": self.start_time_ms,
+            "actor_name": self.actor_name,
+            "version": self.version,
+            "end_time_ms": self.end_time_ms,
+        }
\ No newline at end of file
diff --git a/python/ray/serve/_private/deployment_state.py b/python/ray/serve/_private/deployment_state.py
index d1e29e9f08..d507799987 100644
--- a/python/ray/serve/_private/deployment_state.py
+++ b/python/ray/serve/_private/deployment_state.py
@@ -2340,6 +2340,9 @@ class DeploymentState:
             bool: Whether the target state has changed.
         """

+        logger.info(f"Deploying deployment {deployment_info.to_dict()}")
+        logger.info(f"Current target state: {self._target_state.info.to_dict() if self._target_state.info is not None else None}")
+
         curr_deployment_info = self._target_state.info
         if curr_deployment_info is not None:
             # Redeploying should not reset the deployment's start time.

Hi @abrarsheikh Thank you for taking the time to review. PTAL.

@abrarsheikh
Copy link
Contributor

tests are failing.

Signed-off-by: JianZhang <keepromise@apache.org>
Signed-off-by: JianZhang <keepromise@apache.org>
Comment on lines 419 to 428
# Log the resulting declarative + imperative version of the deployment config
# after each update to the application
if deployment_infos is not None and not deleting:
deployment_configs = {
name: info.to_dict() for name, info in deployment_infos.items()
}
logger.info(
f"Application '{self._name}' updated. Deployment configs (declarative + imperative): "
f"{json.dumps(deployment_configs, indent=2, default=str)}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the log in deploy is sufficient. this one is going to be too noisy since it get called on every autoscaling decision.

Signed-off-by: JianZhang <keepromise@apache.org>
@harshit-anyscale harshit-anyscale added the go add ONLY when ready to merge, run all tests label Dec 15, 2025
Copy link
Contributor

@harshit-anyscale harshit-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abrarsheikh
Copy link
Contributor

please add screenshot of manual test after this change.

@KeeProMise
Copy link
Contributor Author

please add screenshot of manual test after this change.

image

@abrarsheikh abrarsheikh merged commit ee57dea into ray-project:master Dec 16, 2025
7 checks passed
cszhu pushed a commit that referenced this pull request Dec 17, 2025
## Description

[Serve] log deployment config in controller logs
## Related issues
#59167
## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: JianZhang <keepromise@apache.org>
@KeeProMise
Copy link
Contributor Author

@abrarsheikh Thank you for reviewing and merging this PR.

@abrarsheikh
Copy link
Contributor

@KeeProMise message me on slack if you are interested in making other contributions :)

zzchun pushed a commit to zzchun/ray that referenced this pull request Dec 18, 2025
## Description

[Serve] log deployment config in controller logs
## Related issues
ray-project#59167
## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: JianZhang <keepromise@apache.org>
Yicheng-Lu-llll pushed a commit to Yicheng-Lu-llll/ray that referenced this pull request Dec 22, 2025
## Description

[Serve] log deployment config in controller logs
## Related issues
ray-project#59167
## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: JianZhang <keepromise@apache.org>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description

[Serve] log deployment config in controller logs
## Related issues
ray-project#59167
## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: JianZhang <keepromise@apache.org>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants