aws · laurenyu · Apr 3, 2020 · Mar 11, 2020 · Mar 11, 2020 · Mar 11, 2020
@@ -84,6 +84,22 @@ A managed environment for TensorFlow training and hosting on Amazon SageMaker
 
     sagemaker.tensorflow
 
+*******
+XGBoost
+*******
+A managed environment for XGBoost training and hosting on Amazon SageMaker
+
+.. toctree::
+    :maxdepth: 1
+
+    using_xgboost
+
+.. toctree::
+    :maxdepth: 2
+
+    xgboost
+
+
 ************
 Scikit-Learn
 ************

@@ -0,0 +1,200 @@
+###########################################
+Using XGBoost with the SageMaker Python SDK
+###########################################
+
+.. contents::
+
+eXtreme Gradient Boosting (XGBoost) is a popular and efficient machine learning algorithm used for regression and classification tasks on tabular datasets.
+It implements a technique known as gradient boosting on trees, which performs remarkably well in machine learning competitions.
+
+Amazon SageMaker supports two ways to use the XGBoost algorithm:
+
+ * XGBoost built-in algorithm
+ * XGBoost open source algorithm
+
+The XGBoost open source algorithm provides the following benefits over the built-in algorithm:
+
+* Latest version - The open source XGBoost algorithm supports XGBoost verson 1.0 that has better performance scaling on multi-core instances and
+  improved stability for distributed training.
+* Flexibility - Because you write a custom training script, the open source XGBoost algorithm, you can add custom pre- and post-processing logic,
+  run additional code after training, and take advantage of the full range of XGBoost functions. For example, cross-validation support.
+* Scalability - The XGBoost open source algorithm has a more efficient implementation of distributed training,
+  which enables it to scale out to more instances and reduce out-of-memory errors.
+* Exensibility - Because the open source XGBoost container is open source,
+  you can extend the container to install additional libraries and change the version of XGBoost that the container uses.
+  For more information, see `SageMaker XGBoost Container <https://github.com/aws/sagemaker-xgboost-container>`__.
+
+
+***********************************
+Use XGBoost as a Built-in Algortihm
+***********************************
+
+Amazon SageMaker provides XGBoost as a built-in algorithm that you can use like other built-in algorithms.
+Using the built-in algorithm version of XGBoost is simpler than using the open source version, because you don't have to write a training script.
+If you don't need the features and flexibility of open source XGBoost, consider using the built-in version.
+For information about using the Amazon SageMaker XGBoost built-in algorithm, see `XGBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html>`__
+in the *Amazon SageMaker Developer Guide*.
+
+*************************************
+Use the Open Source XGBoost Algorithm
+*************************************
+
+If you want the flexibility and additional features that it provides, use the SageMaker open source XGBoost algorithm.
+
+For a complete example of using the open source XGBoost algorithm, see the sample notebook at
+https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone_dist_script_mode.ipynb.
+
+
+Train a Model with Open Source XGBoost
+======================================
+
+To train a model by using the Amazon SageMaker open source XGBoost algorithm:
+
+.. |create xgboost estimator| replace:: Create a ``sagemaker.xgboost.XGBoost estimator``
+.. _create xgboost estimator: #create-an-estimator
+
+.. |call fit| replace:: Call the estimator's ``fit`` method
+.. _call fit: #call-the-fit-method
+
+1. `Prepare a training script <#prepare-a-training-script>`_
+2. |create xgboost estimator|_
+3. |call fit|_
+
+Prepare a Training Script
+-------------------------
+
+A typical training script loads data from the input channels, configures training with hyperparameters, trains a model,
+and saves a model to model_dir so that it can be hosted later.
+Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance.
+
+For a complete example of an XGBoost training script, see https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/abalone.py.
+
+Let's look at the main elements of the script. Starting with the ``__main__`` guard,
+use a parser to read the hyperparameters passed to the estimator when creating the training job.
+These hyperparameters are made available as arguments to our input script.
+We also parse a number of Amazon SageMaker-specific environment variables to get information about the training environment,
+such as the location of input data and location where we want to save the model.
+
+.. code:: python
+
+    if __name__ == '__main__':
+        parser = argparse.ArgumentParser()
+
+    # Hyperparameters are described here
+    parser.add_argument('--num_round', type=int)
+    parser.add_argument('--max_depth', type=int, default=5)
+    parser.add_argument('--eta', type=float, default=0.2)
+    parser.add_argument('--objective', type=str, default='reg:squarederror')
+
+    # Sagemaker specific arguments. Defaults are set in the environment variables.
+    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
+    parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION'])
+
+    args = parser.parse_args()
+
+    train_hp = {
+        'max_depth': args.max_depth,
+        'eta': args.eta,
+        'gamma': args.gamma,
+        'min_child_weight': args.min_child_weight,
+        'subsample': args.subsample,
+        'silent': args.silent,
+        'objective': args.objective
+    }
+
+    dtrain = xgb.DMatrix(args.train)
+    dval = xgb.DMatrix(args.validation)
+    watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')]
+
+    callbacks = []
+    prev_checkpoint, n_iterations_prev_run = add_checkpointing(callbacks)
+    # If checkpoint is found then we reduce num_boost_round by previously run number of iterations
+
+    bst = xgb.train(
+        params=train_hp,
+        dtrain=dtrain,
+        evals=watchlist,
+        num_boost_round=(args.num_round - n_iterations_prev_run),
+        xgb_model=prev_checkpoint,
+        callbacks=callbacks
+    )
+
+    model_location = args.model_dir + '/xgboost-model'
+    pkl.dump(bst, open(model_location, 'wb'))
+    logging.info("Stored trained model at {}".format(model_location))
+
+In the training script, you can customize the inference behavior by implementing the follwing functions:
+* ``input_fn`` - how input data is handled.
+* ``predict_fn`` - how the model is invokedfunction, and how the response is returned ).
+* ``output_fn`` - How the response data is handled
+
+These functions are optional. If you want to use the default implementations, do not implement them in your training script.
+
+Create an Estimator
+-------------------
+After you create your training script, create an instance of the :class:`sagemaker.xgboost.XGBoost` estimator.
+Pass an IAM role that has the permissions necessary to run an Amazon SageMaker training job,
+the type and number of instances to use for the training job,
+and a dictionary of the hyperparameters to pass to the training script.
+
+.. code::
+
+    from sagemaker.session import s3_input
+    from sagemaker.xgboost.estimator import XGBoost
+
+    xgb_script_mode_estimator = XGBoost(
+        entry_point="abalone.py",
+        hyperparameters=hyperparameters,
+        image_name=container,
+        role=role, 
+        train_instance_count=1,
+        train_instance_type="ml.m5.2xlarge",
+        framework_version="0.90-1",
+        output_path="s3://{}/{}/{}/output".format(bucket, prefix, "xgboost-script-mode"),
+        train_use_spot_instances=train_use_spot_instances,
+        train_max_run=train_max_run,
+        train_max_wait=train_max_wait,
+        checkpoint_s3_uri=checkpoint_s3_uri
+    )
+
+
+Call the fit Method
+-------------------
+
+After you create an estimator, call the ``fit`` method to run the training job.
+
+.. code::
+
+    xgb_script_mode_estimator.fit({"train": train_input})
+
+
+
+Deploy Open Source XGBoost Models
+=================================
+
+After the training job finishes, call the ``deploy`` method of the estimator to create a predictor that you can use to get inferences from your trained model.
+
+.. code::
+
+    predictor = xgb_script_mode_estimator.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
+    test_data = xgboost.DMatrix('/path/to/data')
+    predictor.predict(test_data)
+
+*************************
+SageMaker XGBoost Classes
+*************************
+
+For information about the SageMaker Python SDK XGBoost classes, see the following topics:
+
+* :class:`sagemaker.xgboost.estimator.XGBoost`
+* :class:`sagemaker.xgboost.model.XGBoostModel`
+* :class:`sagemaker.xgboost.model.XGBoostPredictor`
+
+***********************************
+SageMaker XGBoost Docker Containers
+***********************************
+
+For information about SageMaker XGBoost Docker container and its dependencies, see `SageMaker XGBoost Container <https://github.com/aws/sagemaker-xgboost-container>`_.
+
+
+
@@ -0,0 +1,22 @@
+XGBoost
+-------
+
+The Amazon SageMaker XGBoost open source framework algorithm.
+
+.. autoclass:: sagemaker.xgboost.estimator.XGBoost
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :inherited-members:
+    :exclude-members: image, num_factors, predictor_type, epochs, clip_gradient, mini_batch_size, feature_dim, eps, rescale_grad, bias_lr, linear_lr, factors_lr, bias_wd, linear_wd, factors_wd, bias_init_method, bias_init_scale, bias_init_sigma, bias_init_value, linear_init_method, linear_init_scale, linear_init_sigma, linear_init_value, factors_init_method, factors_init_scale, factors_init_sigma, factors_init_value
+
+
+.. autoclass:: sagemaker.xgboost.model.XGBoostModel
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+.. autoclass:: sagemaker.xgboost.model.XGBoostPredictor
+    :members:
+    :undoc-members:
+    :show-inheritance: