Skip to content

[backend] dsl.ParallelFor loop: cannot resolve the upstream artifact output of a previous pod #11520

@zeidsolh

Description

@zeidsolh

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
  • KFP version: master (which I assume is a predecessor to 2.4.0)
  • KFP SDK version: 2.11.0

Steps to reproduce

Here is a simple pipeline that produces this error:

import kfp
import kfp.kubernetes
from kfp import dsl
from kfp.dsl import Artifact, Input, Output


@dsl.component(base_image="python:3.10")
def split_model_ids(model_ids: str) -> list:
    return model_ids.split(",")

@dsl.component(base_image="python:3.10")
def create_file(file: Output[Artifact], content: str):
    with open(file.path, "w") as f:
        f.write(content)

@dsl.component(base_image="python:3.10")
def read_file(file: Input[Artifact]) -> str:
    with open(file.path, "r") as f:
        print(f.read())

@dsl.pipeline(name="Pipeline", description="Pipeline")
def export_model(
    model_ids: str = "",
):
    model_ids_split_op = split_model_ids(model_ids=model_ids)
    with dsl.ParallelFor(model_ids_split_op.output) as model_id:
        create_file_op = create_file(content=model_id)
        read_file_op = read_file(file=create_file_op.outputs["file"])
        read_file_op.after(create_file_op)

if __name__ == "__main__":
    kfp.compiler.Compiler().compile(export_model, "simple_pipeline.yaml")

Expected result

There were some changes that were made to parallel for loop when running from the master branch. I get this error because of the new kfp driver. This happens because I have a dsl.ParallelFor loop, where it cannot resolve the upstream artifact output of a previous pod.

Materials and Reference

Gives this error:

│ main I0107 16:08:18.259327      22 driver.go:984] parent DAG input parameters: map[pipelinechannel--split-model-ids-Output-loop-item:string_value:"zs1"], artifa │
│ cts: map[]                                                                                                                                                       │
│ main panic: runtime error: invalid memory address or nil pointer dereference                                                                                     │
│ main [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x1f1cd39]                                                                                    │
│ main                                                                                                                                                             │
│ main goroutine 1 [running]:                                                                                                                                      │
│ main github.com/kubeflow/pipelines/backend/src/v2/driver.resolveUpstreamArtifacts({{0x2c074f0, 0x4148aa0}, 0x0, 0xc000809e40, 0xc000c71200, 0xc00012df00, 0xc000 │
│ a718f0, 0xc000befbc0, {0xc00088dde0, 0x9}, ...})                                                                                                                 │
│ main     /go/src/github.com/kubeflow/pipelines/backend/src/v2/driver/driver.go:1450 +0x4b9                                                                       │
│ main github.com/kubeflow/pipelines/backend/src/v2/driver.resolveInputs({0x2c074f0, 0x4148aa0}, 0xc000c71200, 0x0, 0xc00012df00, 0xc000301b90, 0xc000809840, 0xc0 │
│ 00a718f0, 0xc000ba77f0)                                                                                                                                          │
│ main     /go/src/github.com/kubeflow/pipelines/backend/src/v2/driver/driver.go:1221 +0x1b58                                                                      │
│ main github.com/kubeflow/pipelines/backend/src/v2/driver.Container({0x2c074f0, 0x4148aa0}, {{0x7ffe76d8c3f0, 0x16}, {0x7ffe76d8c410, 0x24}, 0xc0009bf310, 0xffff │
│ ffffffffffff, 0x0, {0xc0009a22a0, ...}, ...}, ...)                                                                                                               │
│ main     /go/src/github.com/kubeflow/pipelines/backend/src/v2/driver/driver.go:263 +0x3c9                                                                        │
│ main main.drive()                                                                                                                                                │
│ main     /go/src/github.com/kubeflow/pipelines/backend/src/v2/cmd/driver/main.go:174 +0xb1c                                                                      │
│ main main.main()                                                                                                                                                 │
│ main     /go/src/github.com/kubeflow/pipelines/backend/src/v2/cmd/driver/main.go:77 +0x65                                                                        │
│ main time="2025-01-07T16:08:18.862Z" level=info msg="sub-process exited" argo=true error="<nil>"                                                                 │
│ main time="2025-01-07T16:08:18.862Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no s │
│ uch file or directory"                                                                                                                                           │
│ main time="2025-01-07T16:08:18.862Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no │
│  such file or directory"                                                                                                                                         │
│ main time="2025-01-07T16:08:18.862Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file o │
│ r directory"                                                                                                                                                     │
│ main Error: exit status 2   

Thank you so much!

Impacted by this bug? Give it a 👍.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions