Skip to content

feat(runtimes): add validation for reserved MPI environment variables#3491

Merged
google-oss-prow[bot] merged 1 commit into
kubeflow:masterfrom
adiprathapa:feat/mpi-env-validation
May 9, 2026
Merged

feat(runtimes): add validation for reserved MPI environment variables#3491
google-oss-prow[bot] merged 1 commit into
kubeflow:masterfrom
adiprathapa:feat/mpi-env-validation

Conversation

@adiprathapa
Copy link
Copy Markdown
Contributor

Picking up #3145 per @andreyvelich's ask. The original PR was approved in approach but went stale waiting on a rebase, so this is that rebase.

The MPI plugin's Validate function already catches a few things; this adds a check that rejects any TrainJob.spec.trainer.env entry whose name is in a new MPIReservedEnvNames set. The set lives in pkg/constants/constants.go and covers the four OpenMPI variables the operator manages internally (OMPI_MCA_orte_default_hostfile, OMPI_MCA_plm_rsh_args, OMPI_MCA_orte_keep_fqdn_hostnames, OMPI_MCA_orte_set_default_slots). Approach mirrors the Torch plugin's TorchRunReservedEnvNames check.

Tests in mpi_test.go cover one reserved var, two reserved vars, and a custom var that still passes.

Rebase notes: minor conflicts in constants.go (master added XGBoostReservedEnvNames at the same insertion point) and mpi.go (master dropped an unused intstr import). go vet, gofmt, and the package tests all pass locally.

Fixes #3126

Co-authored-by: Vishal Painjane painjanevishal2204@gmail.com

Signed-off-by: Vishal Painjane <painjanevishal2204@gmail.com>
Copilot AI review requested due to automatic review settings May 9, 2026 14:39
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds server-side validation in the MPI runtime plugin to prevent users from setting OpenMPI environment variables that the operator manages internally, avoiding misconfiguration that can break MPI worker discovery.

Changes:

  • Introduces constants.MPIReservedEnvNames (set of OpenMPI-reserved env var names).
  • Extends MPI plugin Validate to reject TrainJob.spec.trainer.env entries that use any reserved name.
  • Adds unit tests covering single reserved env, multiple reserved envs, and a non-reserved env case.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/runtime/framework/plugins/mpi/mpi.go Rejects TrainJobs that set operator-managed OpenMPI env vars in spec.trainer.env.
pkg/runtime/framework/plugins/mpi/mpi_test.go Adds validation test cases for reserved and non-reserved MPI env vars.
pkg/constants/constants.go Defines the MPIReservedEnvNames set used by validation.

@adiprathapa adiprathapa changed the title feat(mpi): add validation for reserved MPI environment variables feat(runtimes): add validation for reserved MPI environment variables May 9, 2026
Copy link
Copy Markdown
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @adiprathapa!
/ok-to-test
/lgtm
/approve

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit 85bbe58 into kubeflow:master May 9, 2026
33 of 36 checks passed
@google-oss-prow google-oss-prow Bot added this to the v2.3 milestone May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Implement MPI Environment Variable Validation

4 participants