Red-Hat-AI-Innovation-Team · Maxusmusti · Oct 14, 2025 · Sep 15, 2025 · Sep 16, 2025 · Sep 29, 2025
diff --git a/examples/README.md b/examples/README.md
@@ -78,6 +78,29 @@ result = osft(
 )
 ```
 
+### Memory Estimation (Experimental / In-Development)
+
+training_hub includes a library for estimating the expected amount of GPU memory that will be allocated during the fine-tuning of a given model using SFT or OSFT. The calculations are built off of formulas presented in the blog post [How To Calculate GPU VRAM Requirements for an Large-Language Model](https://apxml.com/posts/how-to-calculate-vram-requirements-for-an-llm).
+NOTE: This feature is still a work in-progress. In particular, the given estimates for OSFT may vary from your actual results; the estimate mainly serves to give theoretical bounds.  
+The estimates for SFT should be reasonably close to actual results when using training_hub, but keep in mind that your actual results may still vary. 
+
+**Tutorials:**
+- [Memory Estimation Example](notebooks/memory_estimator_example.ipynb) - Interactive notebook showcasing how to utilize the memory estimator methods.
+
+**Quick Example:**
+```python
+from training_hub import estimate
+
+estimate(training_method='osft',
+    num_gpus=2,
+    model_path="/path/to/model",
+    max_tokens_per_gpu=8192,
+    use_liger=True,
+    verbose=2,
+    unfreeze_rank_ratio: float = 0.25
+)
+```
+
 ## Getting Started
 
 1. **For detailed parameter documentation**: Check the relevant guide in `docs/`

diff --git a/examples/notebooks/memory_estimator_example.ipynb b/examples/notebooks/memory_estimator_example.ipynb
@@ -0,0 +1,357 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "187e6115",
+   "metadata": {},
+   "source": [
+    "# Memory Estimator \n",
+    "\n",
+    "This notebook will provide some examples on how to use the memory_estimator API\n",
+    "to estimate the amount of GPU memory consumed when fine-tuning in Training Hub.\n",
+    "This notebook will cover:\n",
+    "1. How the package's primary class implemented, \n",
+    "2. How it can be subclassed for further extensions,\n",
+    "3. How it can be used via both class instantiation and via convenience function,\n",
+    "\n",
+    "Tips on how LLM memory usage is calculated and how the memory can be reduced will also be mentioned as needed."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a08d4d7c",
+   "metadata": {},
+   "source": [
+    "## Setup"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b8274236",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from training_hub import BasicEstimator, OSFTEstimator, OSFTEstimatorExperimental, estimate"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c61f401a",
+   "metadata": {},
+   "source": [
+    "The estimation depends on several key factors that should be user inputted. These are:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e3515f5",
+   "metadata": {},
+   "source": [
+    "#### The Pre-Trained Model to be Fine-Tuned"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66208c2a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_path = \"ibm-granite/granite-3.3-2b-instruct\"  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b98e920e",
+   "metadata": {},
+   "source": [
+    "#### The Number and Size of Your GPUs\n",
+    "\n",
+    "The given default values will assume you are training on 2x L40s, each containing 48 GB of memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70462895",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "num_gpus = 2\n",
+    "gpu_memory = 48 * (2**30) # 48 GB in bytes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cf719d0",
+   "metadata": {},
+   "source": [
+    "#### The Maximum Number of Tokens You'll Place Onto a GPU\n",
+    "\n",
+    "Note that in training hub, minibatches will be operated in such a way that\n",
+    "the number of tokens on the GPU never exceeds this value"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a735ecbc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "max_tokens_per_gpu = 8192"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b643b37",
+   "metadata": {},
+   "source": [
+    "#### The Unfreeze Rank Ratio\n",
+    "\n",
+    "This is the OSFT parameter that determines what proportion of the parameters can be updated\n",
+    "during the OSFT fine-tuning step. Setting this to 0.33 should give you an estimation similar to SFT,\n",
+    "and setting this to 1 should you give you an estimation about twice as large as SFT's"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "117917b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "unfreeze_rank_ratio = 0.25"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62eba14c",
+   "metadata": {},
+   "source": [
+    "## Profiler Overview\n",
+    "\n",
+    "At a lower level, the profiling module provides a class `BasicEstimator` that implements the memory estimation for training an LLM normally (via SFT).\n",
+    "\n",
+    "The estimator computes this values in the `estimate` function through the following procedure:\n",
+    "\n",
+    "1. Calculate the memory needed to store the model parameters (`_calc_model_params`)\n",
+    "\n",
+    "2. Calculate the memory needed to store the model's gradients (`_calc_gradients`)\n",
+    "\n",
+    "3. Calculate the memory needed to store the model's optimizer states (`_calc_optimizer`)\n",
+    "    - The values of Steps 1-3 is proportional to the number of parameters within the the model.\n",
+    "    - This estimator assumes the AdamW optimizer, which stores 2 optimizer parameters per model parameter\n",
+    "        - Some non-Adam optimizers use only 1 optimizer parameter, although training hub uses AdamW by default\n",
+    "\n",
+    "4. Calculate the memory needed to store the intermediate activations within the model (`_calc_intermediate_activations`)\n",
+    "    - This value is the product of the number of tokens being passed onto a GPU, the number of layers in the model, and the model's hidden dimensionality\n",
+    "\n",
+    "5. Calculate the memory needed to store the activated output the model (`_calc_outputs`)\n",
+    "    - This value is the product of the number of tokens being passed onto a GPU and the vocabulary size of the model.\n",
+    "\n",
+    "6. Calculate any additional memory the model might use (this value is 0 for SFT) (`_calc_additional`)\n",
+    "\n",
+    "7. Sum up the memory calculated in Steps 1-6\n",
+    "\n",
+    "8. Apply multiplers representing possible overhead to get the low bound (1x), expected (1.1x), and upper bound (1.3x) for the memory usage of this model (`_apply_overhead`)\n",
+    "\n",
+    "Note that training hub assumes that all of the above values are stored in Float32 (4 bytes per tensor entry)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e2277c9",
+   "metadata": {},
+   "source": [
+    "## Basic SFT Estimation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e0486d2a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "my_sft_estimator = BasicEstimator(num_gpus=num_gpus,\n",
+    "                                    gpu_memory=gpu_memory,\n",
+    "                                    model_path=model_path,\n",
+    "                                    max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                    verbose=2\n",
+    "                                )\n",
+    "\n",
+    "sft_lower_bound, sft_expected, sft_upper_bound = my_sft_estimator.estimate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4396155f",
+   "metadata": {},
+   "source": [
+    "## OSFT Estimation and Subclassing\n",
+    "Training Hub plans to implement a wide variety of different methods for training LLMs,\n",
+    "with OSFT having been recently implemented.\n",
+    "\n",
+    "Because the estimator is implemented as a class, the individual components for\n",
+    "calculating the memory are their own functions, and LLM methods tend to have similarities\n",
+    "in how they consume memories, we can create new estimators by simply subclassing `BasicEstimator`\n",
+    "and overriding any of the respective methods for the individual pieces of memory computation\n",
+    "with formulas that are more accurate for that training method.\n",
+    "\n",
+    "For example, the estimator for OSFT is implemented as the subclass `OSFTEstimator`.\n",
+    "On top of some under-the-hood changes, its main adjustment is overriding `_calc_model_params`\n",
+    "to use the U, Sigma, and V matrices obtained through SVD calculation instead of the typical\n",
+    "model weight matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93b6afc4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "my_osft_estimator = OSFTEstimator(num_gpus=num_gpus,\n",
+    "                                    gpu_memory=gpu_memory,\n",
+    "                                    model_path=model_path,\n",
+    "                                    max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                    verbose=2,\n",
+    "                                    unfreeze_rank_ratio=unfreeze_rank_ratio\n",
+    "                                )\n",
+    "\n",
+    "osft_lower_bound, osft_expected, osft_upper_bound = my_osft_estimator.estimate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eaefc58e",
+   "metadata": {},
+   "source": [
+    "## OSFT Estimation with Liger Kernels\n",
+    "`BasicEstimator` includes support for Liger Kernels. Liger Kernels aim to drastically\n",
+    "speed up the time needed to fine-tune LLM models as well as reduce the memory footprint\n",
+    "of the fine-tuning process.\n",
+    "\n",
+    "Empirically, the main memory optimization of Liger Kernels is to recalculate the activated outputs\n",
+    "of the model rather than directly storing them on the GPU for future use. This can drastically\n",
+    "improve the memory footprint when training use very large batch sizes. \n",
+    "\n",
+    "For the purposes of this estimator, enabling Liger Kernels will force `_calc_outputs` to always be 0.\n",
+    "\n",
+    "In Training Hub, OSFT uses Liger Kernels by default."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "12a4e81a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "my_liger_estimator = OSFTEstimator(num_gpus=num_gpus,\n",
+    "                                    gpu_memory=gpu_memory,\n",
+    "                                    model_path=model_path,\n",
+    "                                    max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                    verbose=2,\n",
+    "                                    use_liger=True,\n",
+    "                                    unfreeze_rank_ratio=unfreeze_rank_ratio\n",
+    "                                )\n",
+    "\n",
+    "liger_lower_bound, liger_expected, liger_upper_bound = my_liger_estimator.estimate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c82224",
+   "metadata": {},
+   "source": [
+    "## Perform Estimation with the convenience function\n",
+    "\n",
+    "For higher level usage, rather than needing to directly instantiate an estimator object,\n",
+    "we have provided a simple convenience function named `estimate`, in which you can\n",
+    "provide the standard initialization arguments for your estimator as well as the\n",
+    "type of training method you want to estimate for, and you can immediately obtain the estimation bounds.\n",
+    "\n",
+    "To specify the estimation type, you can pass in `\"sft\"` to the `training_method` argument to\n",
+    "estimate for SFT, or `\"osft\"` to estimate for OSFT."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f8ede8ca",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conv_sft_lower_bound, conv_sft_expected, conv_sft_upper_bound = estimate(\n",
+    "                                                                    training_method=\"sft\",\n",
+    "                                                                    num_gpus=num_gpus,\n",
+    "                                                                    gpu_memory=gpu_memory,\n",
+    "                                                                    model_path=model_path,\n",
+    "                                                                    max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                                                    verbose=2\n",
+    "                                                                )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d2e3f526",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conv_osft_lower_bound, conv_osft_expected, conv_osft_upper_bound = estimate(\n",
+    "                                                                        training_method=\"osft\",\n",
+    "                                                                        num_gpus=num_gpus,\n",
+    "                                                                        gpu_memory=gpu_memory,\n",
+    "                                                                        model_path=model_path,\n",
+    "                                                                        max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                                                        verbose=2,\n",
+    "                                                                        unfreeze_rank_ratio=unfreeze_rank_ratio\n",
+    "                                                                    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ab863891",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conv_liger_lower_bound, conv_liger_expected, conv_liger_upper_bound = estimate(\n",
+    "                                                                        training_method=\"osft\",\n",
+    "                                                                        num_gpus=num_gpus,\n",
+    "                                                                        gpu_memory=gpu_memory,\n",
+    "                                                                        model_path=model_path,\n",
+    "                                                                        max_tokens_per_gpu=max_tokens_per_gpu,\n",
+    "                                                                        verbose=2,\n",
+    "                                                                        use_liger=True,\n",
+    "                                                                        unfreeze_rank_ratio=unfreeze_rank_ratio\n",
+    "                                                                    )"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "th_dev",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}