Skip to content

feat(api): Add PodTemplateOverrides API into TrainJob#2785

Merged
google-oss-prow[bot] merged 1 commit into
kubeflow:masterfrom
xigang:override_label_and_annotation
Oct 9, 2025
Merged

feat(api): Add PodTemplateOverrides API into TrainJob#2785
google-oss-prow[bot] merged 1 commit into
kubeflow:masterfrom
xigang:override_label_and_annotation

Conversation

@xigang
Copy link
Copy Markdown
Contributor

@xigang xigang commented Aug 9, 2025

What this PR does / why we need it:

Follow-up: #2784

@xigang xigang changed the title Add Labels and Annotations to PodSpecOverrides feat: add labels and annotations to PodSpecOverrides Aug 9, 2025
@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Aug 9, 2025

/cc @andreyvelich @tenzen-y @mimowo

@google-oss-prow
Copy link
Copy Markdown

@xigang: GitHub didn't allow me to request PR reviews from the following users: mimowo.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @andreyvelich @tenzen-y @mimowo

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment thread pkg/runtime/core/trainingruntime.go Outdated
@astefanutti
Copy link
Copy Markdown
Contributor

/ok-to-test

@coveralls
Copy link
Copy Markdown

coveralls commented Aug 11, 2025

Pull Request Test Coverage Report for Build 18348353117

Details

  • 48 of 48 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.3%) to 54.488%

Totals Coverage Status
Change from base Build 18314017966: 0.3%
Covered Lines: 1214
Relevant Lines: 2228

💛 - Coveralls

@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Aug 11, 2025

@andreyvelich @tenzen-y PTAL. thanks!

Copy link
Copy Markdown
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @xigang!

Comment thread pkg/runtime/core/trainingruntime.go Outdated
Comment thread pkg/runtime/core/trainingruntime.go
@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Aug 12, 2025

Alright, I’ll go ahead and update the code.

@xigang xigang force-pushed the override_label_and_annotation branch from e1f8c2e to b5cec0f Compare August 12, 2025 15:23
@astefanutti
Copy link
Copy Markdown
Contributor

/ok-to-test

@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Aug 13, 2025

@andreyvelich @astefanutti The code has been updated. Please take another look, thanks.

Copy link
Copy Markdown
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Comment thread pkg/apis/trainer/v1alpha1/trainjob_types.go Outdated
Comment thread pkg/runtime/core/trainingruntime.go Outdated
@xigang xigang force-pushed the override_label_and_annotation branch 2 times, most recently from 18651dd to 41371bb Compare August 14, 2025 00:49
Comment thread pkg/runtime/core/trainingruntime.go Outdated
@xigang xigang force-pushed the override_label_and_annotation branch 2 times, most recently from e09ce3f to bdb0e55 Compare August 14, 2025 09:26
Comment thread pkg/runtime/core/trainingruntime.go Outdated
@xigang xigang force-pushed the override_label_and_annotation branch from bdb0e55 to 1f50738 Compare August 14, 2025 09:35
Comment thread pkg/runtime/core/trainingruntime.go Outdated
@xigang xigang force-pushed the override_label_and_annotation branch from 1f50738 to aebc0d3 Compare August 14, 2025 09:55
@andreyvelich
Copy link
Copy Markdown
Member

Awesome, appreciate your time @xigang!

@xigang xigang force-pushed the override_label_and_annotation branch from 448de4f to 888dd64 Compare October 8, 2025 02:44
@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Oct 8, 2025

@andreyvelich @tenzen-y @mimowo PTAL.

Copy link
Copy Markdown
Contributor

@astefanutti astefanutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to update the section about PodSpecOverrides in docs/proposals/2170-kubeflow-trainer-v2/README.md accordingly?

Comment thread test/integration/webhooks/trainjob_test.go Outdated
Copy link
Copy Markdown

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you so much @xigang and maintainers, happy to see it will address kubernetes-sigs/kueue#7156 in 2.1 for Kubeflow and 0.15 in Kueue 👍

@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Oct 8, 2025

Do we want to update the section about PodSpecOverrides in docs/proposals/2170-kubeflow-trainer-v2/README.md accordingly?

Do we want to update the section about PodSpecOverrides in docs/proposals/2170-kubeflow-trainer-v2/README.md accordingly?

@astefanutti Thanks for the reminder. I’ll submit a separate PR for the PodSpecOverrides section in the 2170 proposal.

@xigang xigang force-pushed the override_label_and_annotation branch from 888dd64 to 546331a Compare October 8, 2025 07:43
@xigang
Copy link
Copy Markdown
Contributor Author

xigang commented Oct 8, 2025

/retest

@astefanutti
Copy link
Copy Markdown
Contributor

/lgtm

Thanks @xigang!

Comment on lines +208 to +210
if len(metadata) > 0 {
podTemplatePatch["metadata"] = metadata
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check should be kept as it prevents adding empty metadata objects to the strategic merge patch. It's a defensive programming approach that ensures we only modify the patch when there's actual metadata to apply.

{
Name: "NEW_VALUE",
Value: "from_overrides",
TargetJobs: []trainer.PodTemplateOverrideTargetJob{{Name: constants.Node}},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add labels/annotations to the this test case, to verify it is working.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: xigang <wangxigang2014@gmail.com>
@xigang xigang force-pushed the override_label_and_annotation branch from 546331a to cbf2266 Compare October 8, 2025 14:37
@google-oss-prow google-oss-prow Bot removed the lgtm label Oct 8, 2025
Copy link
Copy Markdown
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this awesome contribution @xigang!
/lgtm
/approve
/hold for @tenzen-y to lgtm

@astefanutti
Copy link
Copy Markdown
Contributor

/lgtm

Copy link
Copy Markdown
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great effort!
/lgtm
/approve

/hold cancel

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [andreyvelich,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow Bot merged commit d56715c into kubeflow:master Oct 9, 2025
32 of 34 checks passed
alexxfan pushed a commit to red-hat-data-services/trainer that referenced this pull request Nov 24, 2025
Signed-off-by: xigang <wangxigang2014@gmail.com>
mahdikhashan pushed a commit to mahdikhashan/trainer that referenced this pull request Dec 29, 2025
Signed-off-by: xigang <wangxigang2014@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants