Skip to content

Conversation

@danobi06
Copy link
Contributor

Description

Add check to install spark cli only if one of the following conditions are true

  1. workflows blueprint is enabled in DZ project profile
  2. workflows blueprint exists (fallback)

Add || logic to spark cli installation to gracefully fail and continue post startup execution.

Type of Change

  • [✅] Image update - Bug fix
  • Image update - New feature
  • Image update - Breaking change
  • SMD image build tool update
  • Documentation update

Release Information

Does this change need to be included in patch version releases? By default, any pull requests will only be added to the next SMD image minor version release once they are merged in template folder. Only critical bug fix or security update should be applied to new patch versions of existed image minor versions.

  • [✅] Yes (Critical bug fix or security update)
  • No (New feature or non-critical change)
  • N/A (Not an image update)

If yes, please explain why:
This change is needed to continue sagemaker_ui_post_startup.sh execution in the event sm-spark-cli fails to install. In addition, sm-spark-cli will only be installed if sagemaker workflows blueprint exists.

How Has This Been Tested?

Tested post startup script in an SMD environment by

  1. Modifying the below files
/etc/sagemaker-ui/sagemaker_ui_post_startup.sh
/etc/sagemaker-ui/workflows/sm-spark-cli-install.sh
/etc/sagemaker-ui/workflows/workflow_client.py
/etc/sagemaker-ui/workflows/start-workflows-container.sh
  1. Executing bash /etc/sagemaker-ui/sagemaker_ui_post_startup.sh.
  2. Observing post startup script execution continuation when sm-spark-cli-install fails to install
Screenshot 2025-09-12 at 00 19 39

Checklist:

  • [ ✅] My code follows the style guidelines of this project
  • [ ✅] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

Test Screenshots (if applicable):

Related Issues

[Link any related issues here]

Additional Notes

[Any additional information that might be helpful for reviewers]

@danobi06 danobi06 requested a review from a team as a code owner September 12, 2025 20:21
@danobi06 danobi06 requested review from claytonparnell and reganbaum and removed request for a team September 12, 2025 20:21

# Install sm-spark-cli
bash /etc/sagemaker-ui/workflows/sm-spark-cli-install.sh
bash /etc/sagemaker-ui/workflows/sm-spark-cli-install.sh || echo "Warning: sm-spark-cli installation failed, continuing..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we put this print statement in the install script to keep the post startup script clean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending this one line is as clean as we can get to safe installing the script.

# fallback to checking if only workflows blueprint exists
try:
blueprint_id = DZ_CLIENT.list_environment_blueprints(
managed=True, domainIdentifier=domain_id, name="Workflows"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC users can modify blueprint names, and we've had issues in the past relying on the blueprint name. Can we get the environment blueprint by looking at the type or another field instead of the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflows blueprint is a managed blueprint provided by sagemaker, customers won't be able to edit the blueprint. Unfortunately list environment blueprint only accepts searching by managed, domainIdentifier and name. ref
In the case users adds their own custom blueprint based off of workflows, then I agree this check won't suffice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying. But the benefit is at least we don't hang anymore, so while not 100% fullproof, does still fix this edge case.

# fallback to checking if only workflows blueprint exists
try:
blueprint_id = DZ_CLIENT.list_environment_blueprints(
managed=True, domainIdentifier=domain_id, name="Workflows"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for clarifying. But the benefit is at least we don't hang anymore, so while not 100% fullproof, does still fix this edge case.

@claytonparnell claytonparnell merged commit 08b9a74 into aws:main Sep 29, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants