You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-5Lines changed: 25 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,26 @@ print(single_text)
61
61
62
62
You can pass a `run_name` parameter to `finetune` and `load_gpt2` if you want to store/load multiple models in a `checkpoint` folder.
63
63
64
+
There is also a command-line interface for both finetining and generation with strong default for just running on a Cloud VM w/ GPU. For finetuning (which will also download the model if not present):
65
+
66
+
```shell
67
+
gpt_2_simple finetune shakespeare.txt
68
+
```
69
+
70
+
And for generation, which generates texts to files in a `gen` folder:
71
+
72
+
```shell
73
+
gpt_2_simple generate
74
+
```
75
+
76
+
Most of the same parameters available in the functions are available as CLI arguments, e.g.:
See below to see what some of the CLI arguments do.
83
+
64
84
NB: *Restart the Python session first* if you want to finetune on another dataset or load another model.
65
85
66
86
## Differences Between gpt-2-simple And Other Text Generation Utilities
@@ -72,8 +92,9 @@ The method GPT-2 uses to generate text is slightly different than those like oth
72
92
* GPT-2 can only generate a maximum of 1024 tokens per request (about 3-4 paragraphs of English text).
73
93
* GPT-2 cannot stop early upon reaching a specific end token. (workaround: pass the `truncate` parameter to a `generate` function to only collect text until a specified end token. You may want to reduce `length` appropriately.)
74
94
* Higher temperatures work better (e.g. 0.7 - 1.0) to generate more interesting text, while other frameworks work better between 0.2 - 0.5.
75
-
* When finetuning GPT-2, it has no sense of the beginning or end of a document within a larger text. You'll need to use a bespoke character sequence to indicate the beginning and end of a document. Then while generating, you can specify a `prefix` targeting the beginning token sequences, and a `truncate` targeting the end token sequence.
95
+
* When finetuning GPT-2, it has no sense of the beginning or end of a document within a larger text. You'll need to use a bespoke character sequence to indicate the beginning and end of a document. Then while generating, you can specify a `prefix` targeting the beginning token sequences, and a `truncate` targeting the end token sequence. You can also set `include_prefix=False` to discard the prefix token while generating (e.g. if it's something unwanted like `<|startoftext|>`).
76
96
* GPT-2 allows you to generate texts in parallel by setting a `batch_size` that is divisible into `nsamples`, resulting in much faster generation. Works very well with a GPU (can set `batch_size` up to 20 on Colaboratory's K80)!
97
+
* Due to GPT-2's architecture, it scales up nicely with more powerful GPUs. If you want to train for longer periods of time, GCP's P100 GPU is about 3x faster than a K80 for only 3x the price, making it compariable (the V100 is about 1.5x faster than the P100 but about 2x the price). The P100 uses 100% of the GPU even with `batch_size=1`, and about 88% of the V100 GPU.
77
98
78
99
## Planned Work
79
100
@@ -86,13 +107,12 @@ Note: this project is intended to have a very tight scope unless demand dictates
86
107
87
108
## Examples Using gpt-2-simple
88
109
89
-
*[ResetEra](https://www.resetera.com/threads/i-trained-an-ai-on-thousands-of-resetera-thread-conversations-and-it-created-hot-gaming-shitposts.112167/) — Generated video game forum discussions
110
+
*[ResetEra](https://www.resetera.com/threads/i-trained-an-ai-on-thousands-of-resetera-thread-conversations-and-it-created-hot-gaming-shitposts.112167/) — Generated video game forum discussions ([GitHub w/ dumps](https://github.com/minimaxir/resetera-gpt-2))
111
+
*[/r/legaladvice](https://www.reddit.com/r/legaladviceofftopic/comments/bfqf22/i_trained_a_moreadvanced_ai_on_rlegaladvice/) — Title generation ([GitHub w/ dumps](https://github.com/minimaxir/legaladvice-gpt2))
90
112
91
113
## Maintainer/Creator
92
114
93
-
Max Woolf ([@minimaxir](http://minimaxir.com))
94
-
95
-
*Max's open-source projects are supported by his [Patreon](https://www.patreon.com/minimaxir). If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.*
0 commit comments