Skip to content

Commit c781b7c

Browse files
committed
Addressing Mustafa's comments on the readme, adjusting an typcheck from coderabbit, some other documentation cleaning
1 parent 7cd94a9 commit c781b7c

File tree

3 files changed

+32
-3
lines changed

3 files changed

+32
-3
lines changed

examples/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,29 @@ result = osft(
7878
)
7979
```
8080

81+
### Memory Estimation (Experimental / In-Development)
82+
83+
training_hub includes a library for estimating the expected amount of GPU memory that will be allocated during the fine-tuning of a given model using SFT or OSFT. The calculations are built off of formulas presented in the blog post [How To Calculate GPU VRAM Requirements for an Large-Language Model](https://apxml.com/posts/how-to-calculate-vram-requirements-for-an-llm).
84+
NOTE: This feature is still a work in-progress. In particular, the given estimates for OSFT may vary from your actual results; the estimate mainly serves to give theoretical bounds.
85+
The estimates for SFT should be reasonably close to actual results when using training_hub, but keep in mind that your actual results may still vary.
86+
87+
**Tutorials:**
88+
- [Memory Estimation Example](notebooks/memory_estimator_example.ipynb) - Interactive notebook showcasing how to utilize the memory estimator methods.
89+
90+
**Quick Example:**
91+
```python
92+
from training_hub import estimate
93+
94+
estimate(training_method='osft',
95+
num_gpus=2,
96+
model_path="/path/to/model",
97+
max_tokens_per_gpu=8192,
98+
use_liger=True,
99+
verbose=2,
100+
unfreeze_rank_ratio: float = 0.25
101+
)
102+
```
103+
81104
## Getting Started
82105

83106
1. **For detailed parameter documentation**: Check the relevant guide in `docs/`

examples/notebooks/memory_estimator_example.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"# Memory Estimator \n",
99
"\n",
1010
"This notebook will provide some examples on how to use the memory_estimator API\n",
11-
"to estimate the amount of GPU memory consumed when fine-tuning an LLM model in Training Hub.\n",
11+
"to estimate the amount of GPU memory consumed when fine-tuning in Training Hub.\n",
1212
"This notebook will cover:\n",
1313
"1. How the package's primary class implemented, \n",
1414
"2. How it can be subclassed for further extensions,\n",
@@ -32,7 +32,7 @@
3232
"metadata": {},
3333
"outputs": [],
3434
"source": [
35-
"from training_hub.profiling.memory_estimator import BasicEstimator, OSFTEstimator, OSFTEstimatorExperimental, estimate"
35+
"from training_hub import BasicEstimator, OSFTEstimator, OSFTEstimatorExperimental, estimate"
3636
]
3737
},
3838
{

src/training_hub/profiling/memory_estimator.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,11 @@ def _calc_intermediate_activations(self):
122122

123123
def _calc_outputs(self):
124124
"""
125-
Calculate the VRAM for storing the model's activated outputs
125+
Calculate the VRAM for storing the model's activated outputs.
126+
Note that this value is 0 if Liger Kernels are used.
126127
"""
127128
if not self.use_liger:
129+
# This nested try-catch attempts to find the model's vocabulary size
128130
try:
129131
vocab_size = self.model.embed_tokens.num_embeddings
130132
except AttributeError:
@@ -316,6 +318,8 @@ def __init__(
316318
use_liger, verbose, trust_remote_code)
317319
self.output_constant = 7/3
318320
self.unfreeze_rank_ratio = unfreeze_rank_ratio
321+
if not (0.0 <= self.unfreeze_rank_ratio <= 1.0):
322+
raise ValueError("Ratio must be in the range [0, 1]")
319323

320324
# Check to see which terms need to be included in the search for valid layers
321325
self.target_terms = MODEL_CONFIGS['default']['patterns']
@@ -417,6 +421,8 @@ def __init__(
417421
effective_batch_size, max_seq_len, max_tokens_per_gpu,
418422
use_liger, verbose, trust_remote_code)
419423
self.unfreeze_rank_ratio = unfreeze_rank_ratio
424+
if not (0.0 <= self.unfreeze_rank_ratio <= 1.0):
425+
raise ValueError("Ratio must be in the range [0, 1]")
420426

421427
@override
422428
def _apply_overhead(self, subtotal):

0 commit comments

Comments
 (0)