-
Notifications
You must be signed in to change notification settings - Fork 512
deepspeed #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
deepspeed #288
Changes from 47 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
d1ea793
deepspeed
haqishen 5ef4792
shard
haqishen b5fac57
full param deepspeed works by this commit
haqishen 0f7086b
offload optimizer & documentation
haqishen 687c456
format & fix save deepspeed weight
haqishen 3b7ff0d
format & update save_checkpoint
haqishen 105a849
update pipfile
haqishen f583395
update pipfile
haqishen cbc50fb
zero init for transformers
haqishen ffee1c0
add some new config
haqishen f40ef52
fix bug
haqishen 9cd37ab
min 1e6
haqishen 69e9eb1
update deepspeed config
haqishen 1415cdc
Merge main to deepspeed
haqishen 9db3fbd
Merge branch 'main' into deepspeed
haqishen b0df016
Update requirements.txt
haqishen d30b51c
remove duplicate code
haqishen a4b76c3
Merge branch 'deepspeed' of github.com:h2oai/h2o-llmstudio into deeps…
haqishen 67629ee
throw warning when compile w/ deepspeed
haqishen 48d7f71
black
haqishen d1efef5
integrate deepspeed into wrap_model_distributed
haqishen d6b0748
remove unuse code
haqishen 3f89359
style
haqishen 5c253f2
fix bug
haqishen 9ff717f
fix bug
haqishen 405b207
Merge branch 'main' into deepspeed
haqishen b3495d4
max token len to 16k
haqishen 7b78538
deepspeed save lora
haqishen 892f47c
update get optimizer
haqishen f2dfb89
fix check disk
haqishen efe77bb
Merge branch 'main' into deepspeed
haqishen d297ec9
comment out offload CPU
haqishen a6781f1
Merge branch 'deepspeed' of github.com:h2oai/h2o-llmstudio into deeps…
haqishen e6e46dc
Merge branch 'main' into deepspeed
haqishen e16cab8
Pipfile.lock
haqishen 65a1b2d
Merge branch 'main' into deepspeed
haqishen 32b16a5
Update requirements.txt
haqishen eb4c990
Merge branch 'main' into deepspeed
haqishen e36fada
make black
haqishen bc4c239
Merge branch 'deepspeed' of github.com:h2oai/h2o-llmstudio into deeps…
haqishen b5e59e9
add default
haqishen 24eeb16
minor fix
haqishen b9e5934
minor fix
haqishen a296cca
minor fix
haqishen 11a4b8d
fix val loader
haqishen 3efa2c9
potential val loader fix
psinger 14bc17e
update
psinger 0f40322
merge
psinger bd1e134
lock
psinger 6f81182
Update requirements.txt
psinger 62fc9c5
improve model saving for deepspeed
haqishen dbbbcdf
solved INFLIGHT problem
haqishen c023d19
update doc
haqishen 2785f9f
deepspeed default push to hub by cpu
haqishen aa17c0b
Revert "improve model saving for deepspeed"
haqishen 4491c16
remove unuse code
haqishen fa031f2
Merge branch 'main' into deepspeed
haqishen 9337741
Update requirements.txt
haqishen 263f48a
deepspeed==0.11.1
haqishen 83429b6
Merge branch 'main' into deepspeed
haqishen 882631a
Update requirements.txt
haqishen 368f0af
temp fix for deepspeed slow gen
haqishen 011e269
Merge branch 'deepspeed' of github.com:h2oai/h2o-llmstudio into deeps…
haqishen d5dbbfb
style
haqishen 5b8499c
style
haqishen 07bb4b2
fix
psinger 91562e9
Merge branch 'main' into deepspeed
haqishen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-offload-optimizer.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Whether to offload optimizer to cpu for saving more GPU ram during training. Note that turn on offload_optimizer would further make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-reduce-bucket-size.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Number of elements reduced/allreduced at a time. Limits the memory required for the allgather for large model sizes. Smaller values use less memory, but make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-max-live-parameters.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| The maximum number of parameters resident per GPU before releasing. Smaller values use less memory, but make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-max-reuse-distance.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Do not release a parameter if it will be reused within this threshold of parameters. Smaller values use less memory, but make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
1 change: 1 addition & 0 deletions
1
...ion/docs/tooltips/experiments/_deepspeed-stage3-param-persistence-threshold.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Do not partition parameters smaller than this threshold. Smaller values use less memory, but can greatly increase communication and make training speed slower. (especially latency-bound messages). | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
1 change: 1 addition & 0 deletions
1
documentation/docs/tooltips/experiments/_deepspeed-stage3-prefetch-bucket-size.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Maximum number of parameter elements to fetch ahead of use. Smaller values use less memory, but make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Whether to use deepspeed for saving GPU ram during training. Note that turn on deepspeed would make training speed slower. | ||
psinger marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.