Skip to content

feat(backend/sdk): add backend support for toleration lists. #11860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 25, 2025

Conversation

HumairAK
Copy link
Collaborator

It is a very reasonable usecase the user may want to provide as input a list of tolerations instead of a single toleration object. In fact, it is probably more preferable that we don't foce the user to add a new input for each new toleration entry for one of more tasks. This PR makes it so the input tolerations_json call will accept both, either a single or list of toleration input parameters. All tolerations, whether hardcoded, input list, input dicts, are appeneded into a flat list by driver.

Checklist:

@HumairAK HumairAK force-pushed the support_toleration_list branch from 8a0e290 to 296f580 Compare April 25, 2025 14:13
@HumairAK HumairAK force-pushed the support_toleration_list branch from 296f580 to f41dd51 Compare April 25, 2025 14:14
Returns:
Task object with added toleration.
"""
if not isinstance(toleration_json, pipeline_channel.PipelineParameterChannel):
raise TypeError("toleration_json must be a Pipeline Input Parameter.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? It seems reasonable to allow static values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one we can discuss but currently the support is only for pipelienparameter channels and the check is protecting the user from making a mistake only recognizable during runtime (without this), if you are provided hardcoded values then add_toleration() is largely sufficient for most needs even list of dicts, adding support for static list/dicts is out of scope, but I'd be happy to add this in a follow up if we want to go that route

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support for static list/dicts. One caveat here is that the pipeline spec yaml only will utilize the toleration_json for input parameter under the hood, since this isn't used for add_toleration.

clarify toleration json docs

Signed-off-by: Humair Khan <[email protected]>
@HumairAK HumairAK force-pushed the support_toleration_list branch from f41dd51 to b083e6b Compare April 25, 2025 14:48
Copy link
Contributor

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mprahl
Once this PR has been reviewed and has the lgtm label, please ask for approval from humairak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HumairAK
Copy link
Collaborator Author

the e2e and frontend failures are unrelated and are being addressed in other prs, going to bypass merge this for now, we're actively working on a fix for the CI else where, see #11864

@HumairAK HumairAK merged commit fb18235 into kubeflow:master Apr 25, 2025
28 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants