fix(backend): parallelFor resolve upstream inputs. Fixes #11520 #11627

zazulam · 2025-02-13T23:03:01Z

Description of your changes:

Updated resolveUpstreamArtifacts to use getDAGTasks to allow for it parse through parallelFor DAG contexts to retrieve the appropriate producerTask. This fixes [backend] dsl.ParallelFor loop: cannot resolve the upstream artifact output of a previous pod #11520, it seems that the call with the filter to GetExecutionsInDAG was not reverted back to the same as it is in resolveUpstreamParameters in feat(backend): implement subdag output resolution #11196.
Updated the argocompiler for the iteratorTask section. The additional "-loop" added to the task associated with the parallelFor DAG breaks the dependency validation that the argoworkflows API calls in the submission of the pipeline. This partially resolves [backend/sdk] Support dsl.collected() in KFP #10050 for the .after() usage of a parallelTask. The implementation for dsl.Collected in the backend will be coming in a follow-up PR.

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

google-oss-prow · 2025-02-13T23:03:11Z

Hi @zazulam. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

droctothorpe · 2025-02-13T23:07:30Z

/ok-to-test

zazulam · 2025-02-14T03:29:44Z

/retest

zazulam · 2025-02-14T03:37:59Z

Seems like this PR is also being affected by the flaky tests. I ran the failing pipelines locally and all passed.

droctothorpe · 2025-02-16T01:04:42Z

cc @HumairAK this resolves #11520.

zazulam · 2025-02-19T16:07:32Z

Adding some details here related to .after() scenario, running an argo lint on the workflow file generated from the IR can show the actual error from the argoworkflow api that was being raised in this comment.

With removing that -loop

Only diff in the workflow

The example used was from the comment and also added as the test case in the samples/v2 suite:

from kfp import dsl


@dsl.component
def print_op(message: str) -> str:
    print(message)
    return message

@dsl.component
def reduce_op(message: str) -> str:
    print(message)
    return message[0]


@dsl.pipeline()
def my_pipeline():
    with dsl.ParallelFor([1, 2, 3]):
        one = print_op(message='foo')
    two = print_op(message='bar').after(one)

HumairAK · 2025-02-21T01:17:43Z

/lgtm
/approve

Thanks for the quick turn around on this folks!

HumairAK · 2025-02-21T01:18:41Z

@zazulam can you rebase? there are conflicts

zazulam · 2025-02-21T03:52:38Z

@HumairAK I actually started to work on implementing the backend support for dsl.Collected 😅
I just need to add some tests for artifacts and parameters, then this PR should be good to fully resolve #10050.

Signed-off-by: zazulam <[email protected]>

zazulam · 2025-02-21T15:19:52Z

@HumairAK I actually started to work on implementing the backend support for dsl.Collected 😅
I just need to add some tests for artifacts and parameters, then this PR should be good to fully resolve #10050.

I'm going to save the dsl.Collected for a separate PR as I learned from #6161 that there exists certain Iterator classes in the pipelinespec (I'm not aware atm if they are being used or set anywhere in the backend). My current solution is not leveraging those classes to fan in the outputs, but I would rather not hold back this PR.

HumairAK · 2025-02-21T15:48:33Z

@zazulam I think separate pr makes sense, if we can keep this one light weight I might be able to cherry pick this for the 2.4.1 patch release I'll make next week so we can address the regression for kubeflow 1.10. Feel free to hit me up on slack once the pr is ready, or if you get hit with flaky tests.

HumairAK · 2025-02-21T16:48:20Z

/approve

google-oss-prow · 2025-02-21T16:48:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: droctothorpe, HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~backend/OWNERS~~ [HumairAK]
~~samples/OWNERS~~ [HumairAK]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

HumairAK · 2025-02-21T16:48:35Z

/lgtm

… (kubeflow#11627) Signed-off-by: zazulam <[email protected]>

… (kubeflow#11627) (cherry picked from commit f7c0616) Signed-off-by: zazulam <[email protected]>

… (kubeflow#11627) (cherry picked from commit f7c0616) Signed-off-by: zazulam <[email protected]> Signed-off-by: Humair Khan <[email protected]>

* chore: Remove License checker (#11609) * update dependencies Signed-off-by: Humair Khan <[email protected]> * remove license checking Signed-off-by: Humair Khan <[email protected]> --------- (cherry picked from commit c100648) Signed-off-by: Humair Khan <[email protected]> * fix(backend): Replaced hardcoded ServiceAccount with default config (#11578) (cherry picked from commit 18641e1) Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: Humair Khan <[email protected]> * fix(backend) fix run retry for argo (#11585) (cherry picked from commit b131566) Signed-off-by: arpechenin <[email protected]> Signed-off-by: Humair Khan <[email protected]> * fix(backend): parallelFor resolve upstream inputs. Fixes #11520 (#11627) (cherry picked from commit f7c0616) Signed-off-by: zazulam <[email protected]> Signed-off-by: Humair Khan <[email protected]> * fix(backend): the metacontroller is broken since #11474 (#11608) * Update cluster-role-binding.yaml Signed-off-by: Julius von Kohout <[email protected]> * Create cluster-role.yaml Signed-off-by: juliusvonkohout <[email protected]> * Update kustomization.yaml Signed-off-by: juliusvonkohout <[email protected]> * Update stateful-set.yaml Signed-off-by: juliusvonkohout <[email protected]> --------- (cherry picked from commit a40163f) Signed-off-by: Julius von Kohout <[email protected]> Signed-off-by: juliusvonkohout <[email protected]> Signed-off-by: Humair Khan <[email protected]> * fix(manifests): Upgrading metacontroller to v4.11.22 (#11656) (cherry picked from commit ebaaf75) Signed-off-by: Tarek Abouzeid <[email protected]> Signed-off-by: Humair Khan <[email protected]> * fix(backend): ignore unknown fields for pb json unmarshaling (#11662) (cherry picked from commit 9afe23e) Signed-off-by: Humair Khan <[email protected]> * chore: improved securitycontext for mysql (#11678) * Update mysql-deployment.yaml Signed-off-by: Julius von Kohout <[email protected]> * Update mysql-deployment.yaml Signed-off-by: Julius von Kohout <[email protected]> * Update mysql-deployment.yaml Signed-off-by: Julius von Kohout <[email protected]> --------- (cherry picked from commit 78675b0) Signed-off-by: Julius von Kohout <[email protected]> Signed-off-by: Humair Khan <[email protected]> * chore: partially revert 9afe23e (#11713) In 9afe23e we introduced blackend dropping of unknown fields for unmarshalling, but going forward we want to handle this more on a case by case basis. In the case for driver we should drop them because by this point the api server has declare the pipeline spec is acceptable, so the driver should not fail here. As such we keep driver changes, but revert those utilized by the api server. (cherry picked from commit 13b8194) Signed-off-by: Humair Khan <[email protected]> * fix(backend) fix execution-level retry on the Argo Workflows backend (#11673) (cherry picked from commit 30210e3) Signed-off-by: ntny <[email protected]> Signed-off-by: arpechenin <[email protected]> Co-authored-by: arpechenin <[email protected]> Signed-off-by: Humair Khan <[email protected]> * Limit the number of parallel tests in SDK execution tests (#11680) Often times, pods could not be scheduled because of insufficient CPU and the worker would run out of disk space. Signed-off-by: mprahl <[email protected]> Signed-off-by: Humair Khan <[email protected]> * correct kfp deploy images Signed-off-by: Humair Khan <[email protected]> --------- Signed-off-by: Humair Khan <[email protected]> Signed-off-by: Helber Belmiro <[email protected]> Signed-off-by: arpechenin <[email protected]> Signed-off-by: zazulam <[email protected]> Signed-off-by: Julius von Kohout <[email protected]> Signed-off-by: juliusvonkohout <[email protected]> Signed-off-by: Tarek Abouzeid <[email protected]> Signed-off-by: ntny <[email protected]> Signed-off-by: mprahl <[email protected]> Co-authored-by: Helber Belmiro <[email protected]> Co-authored-by: Anton Pechenin <[email protected]> Co-authored-by: Michael <[email protected]> Co-authored-by: Julius von Kohout <[email protected]> Co-authored-by: Tarek Abouzeid <[email protected]> Co-authored-by: arpechenin <[email protected]> Co-authored-by: Matt Prahl <[email protected]>

google-oss-prow bot requested review from animeshsingh and hbelmiro February 13, 2025 23:03

google-oss-prow bot added needs-ok-to-test size/M labels Feb 13, 2025

google-oss-prow bot added ok-to-test size/L and removed needs-ok-to-test size/M labels Feb 13, 2025

droctothorpe approved these changes Feb 16, 2025

View reviewed changes

zazulam force-pushed the parallelfor-upstream branch 2 times, most recently from 3adfc00 to bb48a9a Compare February 20, 2025 19:26

zazulam changed the title ~~fix(backend): parallelFor consume upstream inputs. Fixes #11520~~ fix(backend): parallelFor resolve upstream inputs. Fixes #11520 Feb 20, 2025

google-oss-prow bot assigned HumairAK Feb 21, 2025

google-oss-prow bot added lgtm approved labels Feb 21, 2025

HumairAK removed the approved label Feb 21, 2025

fix(backend): parallelFor consume upstream inputs

c8a49fc

Signed-off-by: zazulam <[email protected]>

zazulam force-pushed the parallelfor-upstream branch from bb48a9a to c8a49fc Compare February 21, 2025 15:13

google-oss-prow bot added approved and removed lgtm labels Feb 21, 2025

google-oss-prow bot added the lgtm label Feb 21, 2025

google-oss-prow bot merged commit f7c0616 into kubeflow:master Feb 21, 2025
34 of 35 checks passed

HumairAK mentioned this pull request Feb 23, 2025

fix(backend): randomizing output uri path to avoid overwriting. Fixes #10186 #11243

Merged

HumairAK pushed a commit to HumairAK/data-science-pipelines that referenced this pull request Feb 28, 2025

fix(backend): parallelFor resolve upstream inputs. Fixes kubeflow#11520…

2f6f6d4

… (kubeflow#11627) Signed-off-by: zazulam <[email protected]>

HumairAK pushed a commit to HumairAK/data-science-pipelines that referenced this pull request Feb 28, 2025

fix(backend): parallelFor resolve upstream inputs. Fixes kubeflow#11520…

f64f382

… (kubeflow#11627) (cherry picked from commit f7c0616) Signed-off-by: zazulam <[email protected]>

zazulam mentioned this pull request Mar 5, 2025

feat(backend/sdk): enable dsl.Collected for parameters & artifacts #11725

Merged

2 tasks

mr-morj mentioned this pull request Apr 9, 2025

[bug] Unable create component with parameter from the pipeline after ParallelFor #11801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(backend): parallelFor resolve upstream inputs. Fixes #11520 #11627

fix(backend): parallelFor resolve upstream inputs. Fixes #11520 #11627

Uh oh!

zazulam commented Feb 13, 2025

Uh oh!

google-oss-prow bot commented Feb 13, 2025

Uh oh!

droctothorpe commented Feb 13, 2025

Uh oh!

zazulam commented Feb 14, 2025

Uh oh!

zazulam commented Feb 14, 2025

Uh oh!

droctothorpe commented Feb 16, 2025

Uh oh!

zazulam commented Feb 19, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

zazulam commented Feb 21, 2025

Uh oh!

zazulam commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025 •

edited

Loading

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

google-oss-prow bot commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

fix(backend): parallelFor resolve upstream inputs. Fixes #11520 #11627

fix(backend): parallelFor resolve upstream inputs. Fixes #11520 #11627

Uh oh!

Conversation

zazulam commented Feb 13, 2025

Uh oh!

google-oss-prow bot commented Feb 13, 2025

Uh oh!

droctothorpe commented Feb 13, 2025

Uh oh!

zazulam commented Feb 14, 2025

Uh oh!

zazulam commented Feb 14, 2025

Uh oh!

droctothorpe commented Feb 16, 2025

Uh oh!

zazulam commented Feb 19, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

zazulam commented Feb 21, 2025

Uh oh!

zazulam commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

google-oss-prow bot commented Feb 21, 2025

Uh oh!

HumairAK commented Feb 21, 2025

Uh oh!

Uh oh!

Uh oh!

HumairAK commented Feb 21, 2025 •

edited

Loading