Skip to content

Merge branch 'master' into zwei #1633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
bd27b39
prepare release v1.64.1
Jun 16, 2020
1c1a29a
update development version to v1.64.2.dev0
Jun 16, 2020
b37b4dc
fix: workflow passing spot training param to training job (#1599)
chuyang-deng Jun 16, 2020
875744a
fix: set logs to False if wait is False in AutoML (#1585)
chuyang-deng Jun 17, 2020
af7f75a
change: update distributed GPU utilization warning message (#1587)
chuyang-deng Jun 17, 2020
ddd05a8
feature: support for describing hyperparameter tuning job (#1594)
chuyang-deng Jun 17, 2020
1c01a2f
prepare release v1.65.0
Jun 17, 2020
25dc97e
update development version to v1.65.1.dev0
Jun 17, 2020
4d948ac
doc: add some clarification to Processing docs (#1600)
laurenyu Jun 17, 2020
fe6853c
change: remove include_package_data=True from setup.py (#1602)
laurenyu Jun 17, 2020
eeb71ae
infra: specify what kinds of clients in PR template (#1604)
laurenyu Jun 17, 2020
b670598
prepare release v1.65.1
Jun 18, 2020
edebb8a
update development version to v1.65.2.dev0
Jun 18, 2020
1ff8bd6
doc: document that Local Mode + local code doesn't support dependenci…
laurenyu Jun 18, 2020
875abe1
infra: upgrade Sphinx to 3.1.1 (#1605)
laurenyu Jun 18, 2020
5d74516
prepare release v1.65.1.post0
Jun 22, 2020
62a791e
update development version to v1.65.2.dev0
Jun 22, 2020
c919830
infra: add py38 to buildspecs (#1615)
metrizable Jun 23, 2020
40a2720
prepare release v1.65.1.post1
Jun 24, 2020
4df8e51
update development version to v1.65.2.dev0
Jun 24, 2020
ae467e1
infra: update feature request issue template (#1625)
ajaykarpur Jun 24, 2020
60bffc2
feature: add 3.8 as supported python version (#1626)
metrizable Jun 24, 2020
bd80070
infra: upgrade airflow to latest stable version (#1628)
metrizable Jun 25, 2020
c002dcd
prepare release v1.66.0
Jun 25, 2020
0ed415f
update development version to v1.66.1.dev0
Jun 25, 2020
9d8b1a5
Merge branch 'master' into zwei
laurenyu Jun 25, 2020
7569fb9
test
laurenyu Jun 25, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
---
name: Feature request
about: Suggest an improvement for this library
about: Suggest new functionality for this library
title: ''
labels: ''
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the feature you'd like**
A clear and concise description of the functionality you want.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**How would this feature be used? Please describe.**
A clear and concise description of the use case for this feature. Please provide an example, if possible.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
Expand Down
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ _Put an `x` in the boxes that apply. You can also fill these out after creating

- [ ] I have read the [CONTRIBUTING](https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md) doc
- [ ] I used the commit message format described in [CONTRIBUTING](https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md#committing-your-change)
- [ ] I have passed the region in to any/all clients that I've initialized as part of this change.
- [ ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
- [ ] I have updated any necessary documentation, including [READMEs](https://github.com/aws/sagemaker-python-sdk/blob/master/README.rst) and [API docs](https://github.com/aws/sagemaker-python-sdk/tree/master/doc) (if appropriate)

#### Tests
Expand Down
61 changes: 60 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,59 @@
# Changelog

## v2.0.0.rc0
## v1.66.0 (2020-06-25)

### Features

* add 3.8 as supported python version

### Testing and Release Infrastructure

* upgrade airflow to latest stable version
* update feature request issue template

## v1.65.1.post1 (2020-06-24)

### Testing and Release Infrastructure

* add py38 to buildspecs

## v1.65.1.post0 (2020-06-22)

### Documentation Changes

* document that Local Mode + local code doesn't support dependencies arg

### Testing and Release Infrastructure

* upgrade Sphinx to 3.1.1

## v1.65.1 (2020-06-18)

### Bug Fixes and Other Changes

* remove include_package_data=True from setup.py

### Documentation Changes

* add some clarification to Processing docs

### Testing and Release Infrastructure

* specify what kinds of clients in PR template

## v1.65.0 (2020-06-17)

### Features

* support for describing hyperparameter tuning job

### Bug Fixes and Other Changes

* update distributed GPU utilization warning message
* set logs to False if wait is False in AutoML
* workflow passing spot training param to training job

## v2.0.0.rc0 (2020-06-17)

### Breaking Changes

Expand All @@ -25,6 +78,12 @@
* remove scipy from dependencies
* remove TF from optional dependencies

## v1.64.1 (2020-06-16)

### Bug Fixes and Other Changes

* include py38 tox env and some dependency upgrades

## v1.64.0 (2020-06-15)

### Features
Expand Down
18 changes: 11 additions & 7 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ SageMaker Python SDK is tested on:

- Python 3.6
- Python 3.7
- Python 3.8

AWS Permissions
~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -162,23 +163,26 @@ You can also run them in parallel:
Building Sphinx docs
~~~~~~~~~~~~~~~~~~~~

Setup a Python environment with ``sphinx`` and ``sagemaker``:
Setup a Python environment, and install the dependencies listed in ``doc/requirements.txt``:

::

# conda
conda create -n sagemaker python=3.7
conda activate sagemaker
conda install sphinx==2.2.2
pip install sagemaker --user
conda install --file doc/requirements.txt

Install the Read The Docs theme:
# pip
pip install -r doc/requirements.txt

::

pip install sphinx_rtd_theme --user
Clone/fork the repo, and install your local version:

::

pip install --upgrade .

Clone/fork the repo, ``cd`` into the ``sagemaker-python-sdk/doc`` directory and run:
Then ``cd`` into the ``sagemaker-python-sdk/doc`` directory and run:

::

Expand Down
4 changes: 2 additions & 2 deletions buildspec-localmodetests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,5 @@ phases:

# local mode tests
- start_time=`date +%s`
- execute-command-if-has-matching-changes "tox -e py37 -- tests/integ -m local_mode --durations 50" "tests/integ" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec-localmodetests.yml"
- ./ci-scripts/displaytime.sh 'py37 local mode' $start_time
- execute-command-if-has-matching-changes "tox -e py38 -- tests/integ -m local_mode --durations 50" "tests/integ" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec-localmodetests.yml"
- ./ci-scripts/displaytime.sh 'py38 local mode' $start_time
2 changes: 1 addition & 1 deletion buildspec-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ phases:
# run unit tests
- AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_SESSION_TOKEN=
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI= AWS_DEFAULT_REGION=
tox -e py36,py37 -- tests/unit
tox -e py36,py37,py38 -- tests/unit

# run a subset of the integration tests
- IGNORE_COVERAGE=- tox -e py36 -- tests/integ -m canary_quick -n 64 --boxed --reruns 2
Expand Down
4 changes: 2 additions & 2 deletions buildspec-unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,5 @@ phases:
- start_time=`date +%s`
- AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_SESSION_TOKEN=
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI= AWS_DEFAULT_REGION=
tox -e py36,py37 --parallel all -- tests/unit
- ./ci-scripts/displaytime.sh 'py36,py37 unit' $start_time
tox -e py36,py37,py38 --parallel all -- tests/unit
- ./ci-scripts/displaytime.sh 'py36,py37,py38 unit' $start_time
6 changes: 3 additions & 3 deletions buildspec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ phases:

# run integration tests
- start_time=`date +%s`
- execute-command-if-has-matching-changes "python3.7 -u ci-scripts/queue_build.py" "tests/integ" "tests/scripts" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml"
- execute-command-if-has-matching-changes "python3.8 -u ci-scripts/queue_build.py" "tests/integ" "tests/scripts" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml"
- ./ci-scripts/displaytime.sh 'build queue' $start_time

- start_time=`date +%s`
- |
execute-command-if-has-matching-changes "env -u AWS_DEFAULT_REGION tox -e py37 -- tests/integ -m \"not local_mode\" -n 512 --reruns 3 --reruns-delay 5 --durations 50 --boto-config '{\"region_name\": \"us-east-2\"}'" "tests/integ" "tests/scripts" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml"
- ./ci-scripts/displaytime.sh 'py37 tests/integ' $start_time
execute-command-if-has-matching-changes "env -u AWS_DEFAULT_REGION tox -e py38 -- tests/integ -m \"not local_mode\" -n 512 --reruns 3 --reruns-delay 5 --durations 50 --boto-config '{\"region_name\": \"us-east-2\"}'" "tests/integ" "tests/scripts" "tests/data" "tests/conftest.py" "tests/__init__.py" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml"
- ./ci-scripts/displaytime.sh 'py38 tests/integ' $start_time

post_build:
finally:
Expand Down
115 changes: 62 additions & 53 deletions doc/amazon_sagemaker_processing.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
##############################
###########################
Amazon SageMaker Processing
##############################
###########################


Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
Expand All @@ -24,76 +24,85 @@ The fastest way to run get started with Amazon SageMaker Processing is by runnin
You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information.


Data Pre-Processing and Model Evaluation with Scikit-Learn
==================================================================
Data Pre-Processing and Model Evaluation with scikit-learn
==========================================================

You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class.

.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
You can run a scikit-learn script to do data processing on SageMaker using the :class:`sagemaker.sklearn.processing.SKLearnProcessor` class.

You first create a ``SKLearnProcessor``

.. code:: python

from sagemaker.sklearn.processing import SKLearnProcessor

sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
role='[Your SageMaker-compatible IAM role]',
instance_type='ml.m5.xlarge',
instance_count=1)
sklearn_processor = SKLearnProcessor(
framework_version="0.20.0",
role="[Your SageMaker-compatible IAM role]",
instance_type="ml.m5.xlarge",
instance_count=1,
)

Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
Then you can run a scikit-learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.

.. code:: python

from sagemaker.processing import ProcessingInput, ProcessingOutput

sklearn_processor.run(code='preprocessing.py',
inputs=[ProcessingInput(
source='s3://your-bucket/path/to/your/data,
destination='/opt/ml/processing/input')],
outputs=[ProcessingOutput(output_name='train_data',
source='/opt/ml/processing/train'),
ProcessingOutput(output_name='test_data',
source='/opt/ml/processing/test')],
arguments=['--train-test-split-ratio', '0.2']
)
sklearn_processor.run(
code="preprocessing.py",
inputs=[
ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"),
],
outputs=[
ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
],
arguments=["--train-test-split-ratio", "0.2"],
)

preprocessing_job_description = sklearn_processor.jobs[-1].describe()

For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook.
For an in-depth look, please see the `Scikit-learn Data Processing and Model Evaluation`_ example notebook.

.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
.. _Scikit-learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb


Data Pre-Processing with Spark
==============================

You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container.

.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
You can use the :class:`sagemaker.processing.ScriptProcessor` class to run a script in a processing container, including your own container.

This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container.

.. code:: python

from sagemaker.processing import ScriptProcessor, ProcessingInput

spark_processor = ScriptProcessor(base_job_name='spark-preprocessor',
image_uri='<ECR repository URI to your Spark processing image>',
command=['/opt/program/submit'],
role=role,
instance_count=2,
instance_type='ml.r5.xlarge',
max_runtime_in_seconds=1200,
env={'mode': 'python'})

spark_processor.run(code='preprocess.py',
arguments=['s3_input_bucket', bucket,
's3_input_key_prefix', input_prefix,
's3_output_bucket', bucket,
's3_output_key_prefix', input_preprocessed_prefix],
logs=False)
spark_processor = ScriptProcessor(
base_job_name="spark-preprocessor",
image_uri="<ECR repository URI to your Spark processing image>",
command=["/opt/program/submit"],
role=role,
instance_count=2,
instance_type="ml.r5.xlarge",
max_runtime_in_seconds=1200,
env={"mode": "python"},
)

spark_processor.run(
code="preprocess.py",
arguments=[
"s3_input_bucket",
bucket,
"s3_input_key_prefix",
input_prefix,
"s3_output_bucket",
bucket,
"s3_output_key_prefix",
input_preprocessed_prefix,
],
logs=False,
)

For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook.

Expand All @@ -106,19 +115,19 @@ Learn More
Processing class documentation
------------------------------

- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor
- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput
- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput
- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob
- :class:`sagemaker.processing.Processor`
- :class:`sagemaker.processing.ScriptProcessor`
- :class:`sagemaker.sklearn.processing.SKLearnProcessor`
- :class:`sagemaker.processing.ProcessingInput`
- :class:`sagemaker.processing.ProcessingOutput`
- :class:`sagemaker.processing.ProcessingJob`


Further documentation
---------------------

- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
- ​​AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
- `Processing class documentation <https://sagemaker.readthedocs.io/en/stable/processing.html>`_
- `AWS Documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html>`_
- `AWS Notebook examples <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing>`_
- `Processing API documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html>`_
- `Processing container specification <https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html>`_
3 changes: 3 additions & 0 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -752,6 +752,9 @@ If you want to keep everything local, and not use Amazon S3 either, you can enab

# pass sagemaker_session to your estimator or model

.. note::
If you enable "local code," then you cannot use the ``dependencies`` parameter in your estimator or model.

We can take the example in `Using Estimators <#using-estimators>`__ , and use either ``local`` or ``local_gpu`` as the instance type.

.. code:: python
Expand Down
5 changes: 2 additions & 3 deletions doc/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
sphinx==2.2.2
numpy
requests==2.20
sphinx==3.1.1
sphinx-rtd-theme==0.5.0
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def read_version():
"awslogs",
"black==19.10b0 ; python_version >= '3.6'",
"stopit==1.1.2",
"apache-airflow==1.10.5",
"apache-airflow==1.10.9",
"fabric>=2.0",
"requests>=2.20.0, <3",
],
Expand All @@ -96,6 +96,7 @@ def read_version():
"Programming Language :: Python",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
],
install_requires=required_packages,
extras_require=extras,
Expand All @@ -105,5 +106,4 @@ def read_version():
"sagemaker-upgrade-v2=sagemaker.cli.compatibility.v2.sagemaker_upgrade_v2:main",
]
},
include_package_data=True, # TODO-reinvent-2019 [knakad]: Remove after rule_configs is in PyPI
)
Loading