Skip to content

PipelineParameter doesn't work in Azure Machine Learning Pipeline with DatabricksStep #1454

Open
@Sofia-tesi

Description

@Sofia-tesi

Hi,
i created a pipeline with several steps (azureml-defaults==1.23.0). When I run the published pipeline from the studio, the databricks steps always take the default value of thePipelineParameter no matter what value I choose when i submit the pipeline.

parser.add_argument('--import_date',         type=str, default = "2021-04-23")   
....
....

import_date      = PipelineParameter(name="import_date"       , default_value = params.import_date)
cluster_id       = PipelineParameter(name="cluster_id"        , default_value = params.cluster_id)
step_type        = PipelineParameter(name="step_type"         , default_value = params.step_type)
churn_months     = PipelineParameter(name="churnMonths"       , default_value = params.churnMonths)



data_import_step = DatabricksStep(name="Databricks Data Import Step",
                                  existing_cluster_id=str(cluster_id.default_value),
                                  notebook_path=import_notebook_path,
                                  notebook_params={'ChurnMonthsWidget': churn_months, 
                                                   'startDateWidget'  : port_start_date,
                                                   'ImportDateWidget' : import_date,
                                                   'StepTypeWidget'   : step_type},
                                  run_name='Job_Data_Import',
                                  compute_target=databricks_compute,
                                  allow_reuse=False)
.....
.....
.....
pipeline_steps = StepSequence(steps=[data_import_step                  #Step 1
                                    ,data_manipulation_step            #Step 2
                                    ,data_extraction_step              #Step 3
                                 
                                    
                                    ,training_data_preparation_step    #Step 4
                                    ,model_training_step               #Step 5
                                 
                                
                                    ,prediction_data_preparation_step  #Step 6
                                    ,prediction_step                   #Step 7
                                    ])
									
pipeline = Pipeline(workspace = ws, steps=pipeline_steps)

published_pipeline = pipeline.publish(name        = params.pipeline_name,
                                         description = params.pipeline_description)

Here the default value is 2021-04-23 and i set the import_date parameter to ''2021-04-22
ParamProblem3

ParamProblem

but the databricks notebook takes 2021-04-23 as import_date

ParamProblem2

the databricks notebook has the following widgets

today = str(date.today())
dbutils.widgets.text("ImportDateWidget", today, label = "ImportDate")
import_date = dbutils.widgets.get("ImportDateWidget")


startDate = "2019-01-01"  
dbutils.widgets.text("startDateWidget", startDate, label = "startDate")
start_date = dbutils.widgets.get("startDateWidget")

ChurnMonths = 3
dbutils.widgets.text("ChurnMonthsWidget", str(ChurnMonths), label = "ChurnMonths")

step_type = "Training"
dbutils.widgets.text("StepTypeWidget",step_type,label ="StepType" )

N_MONTHS_CHURN = int(dbutils.widgets.get("ChurnMonthsWidget"))
import_date    = dbutils.widgets.get("ImportDateWidget")
start_date     = dbutils.widgets.get("startDateWidget")
N_MONTHS_CHURN = int(dbutils.widgets.get("ChurnMonthsWidget"))
step_type = dbutils.widgets.get("StepTypeWidget")

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions