-
Notifications
You must be signed in to change notification settings - Fork 1.2k
documentation: add documentation for XGBoost #1350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
doc/using_xgboost.rst
Outdated
which enables it to scale out to more instances and reduce out-of-memory errors. | ||
* Exensibility - Because the open source XGBoost container is open source, | ||
you can extend the container to install additional libraries and change the version of XGBoost that the container uses. | ||
For more information, see `SageMaker XGBoost Container <https://github.com/aws/sagemaker-xgboost-container>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a notebook that shows extending the image? if so, I think that would be a better link because the README of the repo launches into how to build and test the image, which isn't particularly relevant for people who just want to use it as a base image in their Dockerfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find any such notebook. I get that this link isn't ideal, but do you think nothing would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there documentation for finding the image URI? that's probably the most relevant thing for customers looking to extend the image.
my hesitancy with linking to the instructions for building the image from scratch is that we've had previous GitHub issues where people thought they needed to build the image because that's the README they found, and they hadn't realized that they could just use the pre-built version. to be fair, that might just mean we need to overhaul the framework repository READMEs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find anything other than the use of the get_image_uri function itself in code examples.
I changed the link to point to the example notebook that extends the pytorch container (https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/pytorch_extending_our_containers/pytorch_extending_our_containers.ipynb). That is the only example that I can find on extending containers.
doc/using_xgboost.rst
Outdated
model_location = args.model_dir + '/xgboost-model' | ||
pkl.dump(bst, open(model_location, 'wb')) | ||
logging.info("Stored trained model at {}".format(model_location)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth calling out separately that the script needs to save the model and where it has to be saved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to ask about that in email. So is the extra /xgboost-model subdir within model_dir necessary here? Or can it be saved anywhere within model_dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bumped on the email thread - I honestly don't know in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intro section says you have to save the model to model_dir, and I added a comment to the part of the script where it saves the model (and an add_argument line for SM_MODEL_DIR). I'll add a section for save model when I get more specific information.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
doc/using_xgboost.rst
Outdated
|
||
The XGBoost open source algorithm provides the following benefits over the built-in algorithm: | ||
|
||
* Latest version - The open source XGBoost algorithm supports XGBoost version 1.0, which has better performance scaling on multi-core instances and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to remove the version here and just link to https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/xgboost#xgboost-sagemaker-estimators-and-models, so that fewer places have to be updated with each version upgrade.
although it seems like that page has already gone out of date...
doc/using_xgboost.rst
Outdated
which enables it to scale out to more instances and reduce out-of-memory errors. | ||
* Exensibility - Because the open source XGBoost container is open source, | ||
you can extend the container to install additional libraries and change the version of XGBoost that the container uses. | ||
For more information, see `SageMaker XGBoost Container <https://github.com/aws/sagemaker-xgboost-container>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there documentation for finding the image URI? that's probably the most relevant thing for customers looking to extend the image.
my hesitancy with linking to the instructions for building the image from scratch is that we've had previous GitHub issues where people thought they needed to build the image because that's the README they found, and they hadn't realized that they could just use the pre-built version. to be fair, that might just mean we need to overhaul the framework repository READMEs...
doc/using_xgboost.rst
Outdated
# Hyperparameters are described here | ||
parser.add_argument('--num_round', type=int) | ||
parser.add_argument('--max_depth', type=int, default=5) | ||
parser.add_argument('--eta', type=float, default=0.2) | ||
parser.add_argument('--objective', type=str, default='reg:squarederror') | ||
|
||
# SageMaker specific arguments. Defaults are set in the environment variables. | ||
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) | ||
parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION']) | ||
|
||
args = parser.parse_args() | ||
|
||
train_hp = { | ||
'max_depth': args.max_depth, | ||
'eta': args.eta, | ||
'gamma': args.gamma, | ||
'min_child_weight': args.min_child_weight, | ||
'subsample': args.subsample, | ||
'silent': args.silent, | ||
'objective': args.objective | ||
} | ||
|
||
dtrain = xgb.DMatrix(args.train) | ||
dval = xgb.DMatrix(args.validation) | ||
watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')] | ||
|
||
callbacks = [] | ||
prev_checkpoint, n_iterations_prev_run = add_checkpointing(callbacks) | ||
# If checkpoint is found then we reduce num_boost_round by previously run number of iterations | ||
|
||
bst = xgb.train( | ||
params=train_hp, | ||
dtrain=dtrain, | ||
evals=watchlist, | ||
num_boost_round=(args.num_round - n_iterations_prev_run), | ||
xgb_model=prev_checkpoint, | ||
callbacks=callbacks | ||
) | ||
|
||
model_location = args.model_dir + '/xgboost-model' | ||
pkl.dump(bst, open(model_location, 'wb')) | ||
logging.info("Stored trained model at {}".format(model_location)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this should be indented to match l. 95
doc/using_xgboost.rst
Outdated
model_location = args.model_dir + '/xgboost-model' | ||
pkl.dump(bst, open(model_location, 'wb')) | ||
logging.info("Stored trained model at {}".format(model_location)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bumped on the email thread - I honestly don't know in this case
doc/using_xgboost.rst
Outdated
|
||
Create an Estimator | ||
------------------- | ||
After you create your training script, create an instance of the :class:`sagemaker.xgboost.XGBoost` estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think :class:
sagemaker.xgboost.XGBoost` links to anything automatically here
doc/using_xgboost.rst
Outdated
|
||
.. code:: | ||
|
||
from sagemaker.session import s3_input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can delete the s3_input
import
doc/using_xgboost.rst
Outdated
from sagemaker.session import s3_input | ||
from sagemaker.xgboost.estimator import XGBoost | ||
|
||
xgb_script_mode_estimator = XGBoost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just call the variable xgb
or xgb_estimator
doc/using_xgboost.rst
Outdated
Customize inference | ||
------------------- | ||
|
||
In the script that you provide, you can customize the inference behavior by implementing the follwing functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follwing --> following
doc/using_xgboost.rst
Outdated
|
||
In the script that you provide, you can customize the inference behavior by implementing the follwing functions: | ||
* ``input_fn`` - how input data is handled. | ||
* ``predict_fn`` - how the model is invokedfunction, and how the response is returned ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
messed up copy/paste?
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
error from the Sphinx check:
|
I don't see in the xgboost estimator.py file where this indentation is. I didn't change this file. |
I think it's because you added |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
I think this is https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/xgboost/model.py#L112. Also not entirely sure about the indentation at https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/xgboost/model.py#L119 |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
src/sagemaker/xgboost/model.py
Outdated
@@ -108,16 +108,15 @@ def __init__( | |||
self.model_server_workers = model_server_workers | |||
|
|||
def prepare_container_def(self, instance_type, accelerator_type=None): | |||
"""Return a container definition with framework configuration set in model environment | |||
variables. | |||
"""Return a container definition with framework configuration set in model environment variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the failed build:
�[7;33m************* Module sagemaker.xgboost.model�[0m
src/sagemaker/xgboost/model.py:111:0: C0301: �[1mLine too long (105/100)�[0m (�[1mline-too-long�[0m)
can you split this line up? (and the second line should be indented at the same level as """
)
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Description of changes:
Added rst files to generate XGBoost API reference content and explain usage of open source XGBoost in SageMaker.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.