Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,29 @@ result = osft(
)
```

### Memory Estimation (Experimental / In-Development)

training_hub includes a library for estimating the expected amount of GPU memory that will be allocated during the fine-tuning of a given model using SFT or OSFT. The calculations are built off of formulas presented in the blog post [How To Calculate GPU VRAM Requirements for an Large-Language Model](https://apxml.com/posts/how-to-calculate-vram-requirements-for-an-llm).
NOTE: This feature is still a work in-progress. In particular, the given estimates for OSFT may vary from your actual results; the estimate mainly serves to give theoretical bounds.
The estimates for SFT should be reasonably close to actual results when using training_hub, but keep in mind that your actual results may still vary.

**Tutorials:**
- [Memory Estimation Example](notebooks/memory_estimator_example.ipynb) - Interactive notebook showcasing how to utilize the memory estimator methods.

**Quick Example:**
```python
from training_hub import estimate

estimate(training_method='osft',
num_gpus=2,
model_path="/path/to/model",
max_tokens_per_gpu=8192,
use_liger=True,
verbose=2,
unfreeze_rank_ratio: float = 0.25
)
```

## Getting Started

1. **For detailed parameter documentation**: Check the relevant guide in `docs/`
Expand Down
357 changes: 357 additions & 0 deletions examples/notebooks/memory_estimator_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "187e6115",
"metadata": {},
"source": [
"# Memory Estimator \n",
"\n",
"This notebook will provide some examples on how to use the memory_estimator API\n",
"to estimate the amount of GPU memory consumed when fine-tuning in Training Hub.\n",
"This notebook will cover:\n",
"1. How the package's primary class implemented, \n",
"2. How it can be subclassed for further extensions,\n",
"3. How it can be used via both class instantiation and via convenience function,\n",
"\n",
"Tips on how LLM memory usage is calculated and how the memory can be reduced will also be mentioned as needed."
]
},
{
"cell_type": "markdown",
"id": "a08d4d7c",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8274236",
"metadata": {},
"outputs": [],
"source": [
"from training_hub import BasicEstimator, OSFTEstimator, OSFTEstimatorExperimental, estimate"
]
},
{
"cell_type": "markdown",
"id": "c61f401a",
"metadata": {},
"source": [
"The estimation depends on several key factors that should be user inputted. These are:"
]
},
{
"cell_type": "markdown",
"id": "3e3515f5",
"metadata": {},
"source": [
"#### The Pre-Trained Model to be Fine-Tuned"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66208c2a",
"metadata": {},
"outputs": [],
"source": [
"model_path = \"ibm-granite/granite-3.3-2b-instruct\" "
]
},
{
"cell_type": "markdown",
"id": "b98e920e",
"metadata": {},
"source": [
"#### The Number and Size of Your GPUs\n",
"\n",
"The given default values will assume you are training on 2x L40s, each containing 48 GB of memory."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70462895",
"metadata": {},
"outputs": [],
"source": [
"num_gpus = 2\n",
"gpu_memory = 48 * (2**30) # 48 GB in bytes"
]
},
{
"cell_type": "markdown",
"id": "5cf719d0",
"metadata": {},
"source": [
"#### The Maximum Number of Tokens You'll Place Onto a GPU\n",
"\n",
"Note that in training hub, minibatches will be operated in such a way that\n",
"the number of tokens on the GPU never exceeds this value"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a735ecbc",
"metadata": {},
"outputs": [],
"source": [
"max_tokens_per_gpu = 8192"
]
},
{
"cell_type": "markdown",
"id": "5b643b37",
"metadata": {},
"source": [
"#### The Unfreeze Rank Ratio\n",
"\n",
"This is the OSFT parameter that determines what proportion of the parameters can be updated\n",
"during the OSFT fine-tuning step. Setting this to 0.33 should give you an estimation similar to SFT,\n",
"and setting this to 1 should you give you an estimation about twice as large as SFT's"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "117917b3",
"metadata": {},
"outputs": [],
"source": [
"unfreeze_rank_ratio = 0.25"
]
},
{
"cell_type": "markdown",
"id": "62eba14c",
"metadata": {},
"source": [
"## Profiler Overview\n",
"\n",
"At a lower level, the profiling module provides a class `BasicEstimator` that implements the memory estimation for training an LLM normally (via SFT).\n",
"\n",
"The estimator computes this values in the `estimate` function through the following procedure:\n",
"\n",
"1. Calculate the memory needed to store the model parameters (`_calc_model_params`)\n",
"\n",
"2. Calculate the memory needed to store the model's gradients (`_calc_gradients`)\n",
"\n",
"3. Calculate the memory needed to store the model's optimizer states (`_calc_optimizer`)\n",
" - The values of Steps 1-3 is proportional to the number of parameters within the the model.\n",
" - This estimator assumes the AdamW optimizer, which stores 2 optimizer parameters per model parameter\n",
" - Some non-Adam optimizers use only 1 optimizer parameter, although training hub uses AdamW by default\n",
"\n",
"4. Calculate the memory needed to store the intermediate activations within the model (`_calc_intermediate_activations`)\n",
" - This value is the product of the number of tokens being passed onto a GPU, the number of layers in the model, and the model's hidden dimensionality\n",
"\n",
"5. Calculate the memory needed to store the activated output the model (`_calc_outputs`)\n",
" - This value is the product of the number of tokens being passed onto a GPU and the vocabulary size of the model.\n",
"\n",
"6. Calculate any additional memory the model might use (this value is 0 for SFT) (`_calc_additional`)\n",
"\n",
"7. Sum up the memory calculated in Steps 1-6\n",
"\n",
"8. Apply multiplers representing possible overhead to get the low bound (1x), expected (1.1x), and upper bound (1.3x) for the memory usage of this model (`_apply_overhead`)\n",
"\n",
"Note that training hub assumes that all of the above values are stored in Float32 (4 bytes per tensor entry)\n"
]
},
{
"cell_type": "markdown",
"id": "7e2277c9",
"metadata": {},
"source": [
"## Basic SFT Estimation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0486d2a",
"metadata": {},
"outputs": [],
"source": [
"my_sft_estimator = BasicEstimator(num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2\n",
" )\n",
"\n",
"sft_lower_bound, sft_expected, sft_upper_bound = my_sft_estimator.estimate()"
]
},
{
"cell_type": "markdown",
"id": "4396155f",
"metadata": {},
"source": [
"## OSFT Estimation and Subclassing\n",
"Training Hub plans to implement a wide variety of different methods for training LLMs,\n",
"with OSFT having been recently implemented.\n",
"\n",
"Because the estimator is implemented as a class, the individual components for\n",
"calculating the memory are their own functions, and LLM methods tend to have similarities\n",
"in how they consume memories, we can create new estimators by simply subclassing `BasicEstimator`\n",
"and overriding any of the respective methods for the individual pieces of memory computation\n",
"with formulas that are more accurate for that training method.\n",
"\n",
"For example, the estimator for OSFT is implemented as the subclass `OSFTEstimator`.\n",
"On top of some under-the-hood changes, its main adjustment is overriding `_calc_model_params`\n",
"to use the U, Sigma, and V matrices obtained through SVD calculation instead of the typical\n",
"model weight matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93b6afc4",
"metadata": {},
"outputs": [],
"source": [
"my_osft_estimator = OSFTEstimator(num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2,\n",
" unfreeze_rank_ratio=unfreeze_rank_ratio\n",
" )\n",
"\n",
"osft_lower_bound, osft_expected, osft_upper_bound = my_osft_estimator.estimate()"
]
},
{
"cell_type": "markdown",
"id": "eaefc58e",
"metadata": {},
"source": [
"## OSFT Estimation with Liger Kernels\n",
"`BasicEstimator` includes support for Liger Kernels. Liger Kernels aim to drastically\n",
"speed up the time needed to fine-tune LLM models as well as reduce the memory footprint\n",
"of the fine-tuning process.\n",
"\n",
"Empirically, the main memory optimization of Liger Kernels is to recalculate the activated outputs\n",
"of the model rather than directly storing them on the GPU for future use. This can drastically\n",
"improve the memory footprint when training use very large batch sizes. \n",
"\n",
"For the purposes of this estimator, enabling Liger Kernels will force `_calc_outputs` to always be 0.\n",
"\n",
"In Training Hub, OSFT uses Liger Kernels by default."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12a4e81a",
"metadata": {},
"outputs": [],
"source": [
"my_liger_estimator = OSFTEstimator(num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2,\n",
" use_liger=True,\n",
" unfreeze_rank_ratio=unfreeze_rank_ratio\n",
" )\n",
"\n",
"liger_lower_bound, liger_expected, liger_upper_bound = my_liger_estimator.estimate()"
]
},
{
"cell_type": "markdown",
"id": "48c82224",
"metadata": {},
"source": [
"## Perform Estimation with the convenience function\n",
"\n",
"For higher level usage, rather than needing to directly instantiate an estimator object,\n",
"we have provided a simple convenience function named `estimate`, in which you can\n",
"provide the standard initialization arguments for your estimator as well as the\n",
"type of training method you want to estimate for, and you can immediately obtain the estimation bounds.\n",
"\n",
"To specify the estimation type, you can pass in `\"sft\"` to the `training_method` argument to\n",
"estimate for SFT, or `\"osft\"` to estimate for OSFT."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8ede8ca",
"metadata": {},
"outputs": [],
"source": [
"conv_sft_lower_bound, conv_sft_expected, conv_sft_upper_bound = estimate(\n",
" training_method=\"sft\",\n",
" num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2e3f526",
"metadata": {},
"outputs": [],
"source": [
"conv_osft_lower_bound, conv_osft_expected, conv_osft_upper_bound = estimate(\n",
" training_method=\"osft\",\n",
" num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2,\n",
" unfreeze_rank_ratio=unfreeze_rank_ratio\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab863891",
"metadata": {},
"outputs": [],
"source": [
"conv_liger_lower_bound, conv_liger_expected, conv_liger_upper_bound = estimate(\n",
" training_method=\"osft\",\n",
" num_gpus=num_gpus,\n",
" gpu_memory=gpu_memory,\n",
" model_path=model_path,\n",
" max_tokens_per_gpu=max_tokens_per_gpu,\n",
" verbose=2,\n",
" use_liger=True,\n",
" unfreeze_rank_ratio=unfreeze_rank_ratio\n",
" )"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "th_dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading