-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: workflow passing spot training param to training job #1599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
* Base model trainer (#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (#1541) * Cleanup ModelTrainer (#1542) * General image builder (#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (#1552) * feat: add pre-processing and post-processing logic to inference_spec (#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (#1536) * Add path to set Additional Settings in ModelTrainer (#1555) * feature: support HuggingFace models with JumpStart configs * Update bucket name for the model mapping * Mask Sensitive Env Logs in Container (#1568) * Fix unit test * Fix bug in script mode setup ModelTrainer (#1575) * Save mapping as attribute * Fix style issues * Fix style issues * Fix: bypass jumpstart mapping when not in endpoint mode * Skip JS model mapping with env vars or image URI provided * Revert "Merge branch 'aws:master' into dev-morpheus" This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0. * Merge branch 'aws:master' into dev-morpheus This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb. * Rebase on master-morpheus * Fix unit test description * Fix TEI integ test * Fix style issue * Fix style issues * Fix schema builder integ tests * Fix TEI integ test * Fix code style issue --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]> Co-authored-by: Xiong Zeng <[email protected]> Co-authored-by: Gary Wang <[email protected]>
* Base model trainer (#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (#1541) * Cleanup ModelTrainer (#1542) * General image builder (#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (#1552) * feat: add pre-processing and post-processing logic to inference_spec (#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (#1536) * Add path to set Additional Settings in ModelTrainer (#1555) * feature: support HuggingFace models with JumpStart configs * Update bucket name for the model mapping * Mask Sensitive Env Logs in Container (#1568) * Fix unit test * Fix bug in script mode setup ModelTrainer (#1575) * Save mapping as attribute * Fix style issues * Fix style issues * Fix: bypass jumpstart mapping when not in endpoint mode * Skip JS model mapping with env vars or image URI provided * Revert "Merge branch 'aws:master' into dev-morpheus" This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0. * Merge branch 'aws:master' into dev-morpheus This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb. * Rebase on master-morpheus * Fix unit test description * Fix TEI integ test * Fix style issue * Fix style issues * Fix schema builder integ tests * Fix TEI integ test * Fix code style issue --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]> Co-authored-by: Xiong Zeng <[email protected]> Co-authored-by: Gary Wang <[email protected]>
Issue #, if available:
train_use_spot_instances
param is ignored by workflow'straining_config
method.Description of changes:
check the
train_use_spot_instances
and putEnableManagedSpotTraining
into train_config if it is present.Testing done:
unit test
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.