Skip to content

TrainingStep fails when Estimator object has 'debugger_hook_config' as False #81

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jennakwon06 opened this issue Aug 31, 2020 · 3 comments

Comments

@jennakwon06
Copy link

Hello,

When debugger_hook_config=False is specified in an estimator object, TrainingStep will fail with the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-ce223140d492> in <module>
     20     state_id="samplestate",
     21     estimator=estimator,
---> 22     job_name="samplejob"
     23 )

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/stepfunctions/steps/sagemaker.py in __init__(self, state_id, estimator, job_name, data, hyperparameters, mini_batch_size, experiment_config, wait_for_completion, tags, **kwargs)
     69 
     70         if estimator.debugger_hook_config != None:
---> 71             parameters['DebugHookConfig'] = estimator.debugger_hook_config._to_request_dict()
     72 
     73         if estimator.rules != None:

AttributeError: 'bool' object has no attribute '_to_request_dict'

As per this doc: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_debugger.html#default-behavior-and-opting-out , debugger_hook_config=False is required to disable the hook iniialization.

Code sample:

estimator = TensorFlow(
    entry_point="training.py",
    role="arn:aws:iam::747911269416:role/service-role/AmazonSageMaker-ExecutionRole-20190313T101302",
    train_instance_count=1,
    train_instance_type="ml.m5d.large",
    output_path="s3://jkkwon-miami-us-west-2",
    code_location="s3://jkkwon-miami-us-west-2",
    train_volume_size=1024,
    metric_definitions=[
        {"Name": "train:loss", "Regex": "Train Loss: (.*?);"},
        {"Name": "test:loss", "Regex": "Test Average loss: (.*?),"},
        {"Name": "test:accuracy", "Regex": "Test Accuracy: (.*?)%;"},
    ],
    enable_sagemaker_metrics=True,
    debugger_hook_config=False
)


training_step = TrainingStep(
    state_id="samplestate",
    estimator=estimator,
    job_name="samplejob"
)
@vaib-amz
Copy link
Contributor

vaib-amz commented Sep 2, 2020

Hi @jennakwon06,

Thank you for raising this. We have merged fix #83 for this. This should be available with the next release of the SDK.

Thanks,
Vaib

@vaib-amz vaib-amz closed this as completed Sep 2, 2020
@vaib-amz vaib-amz reopened this Sep 2, 2020
@jennakwon06
Copy link
Author

Hello,

When is the next release containing this scheduled?
We are needing this fix to be downloadable from PyPi.

@shivlaks
Copy link
Contributor

#83 was included in v1.1.2, which was released almost a year ago (Sept 23, 2020)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants