Skip to content

Is it possible to run the system on Google Colab ? #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Levian0313 opened this issue Mar 15, 2023 · 7 comments
Open

Is it possible to run the system on Google Colab ? #42

Levian0313 opened this issue Mar 15, 2023 · 7 comments
Assignees

Comments

@Levian0313
Copy link

Is it possible to reduce the amount of resources needed to run the system on Google Colab ?
because not everyone has the means to experiment with A100 80gb

@lachmed
Copy link

lachmed commented Mar 22, 2023

+1

@orangetin
Copy link
Member

Google Colab Pro offers A100 40GB with 40GB RAM when using high-ram IIRC.

You'd need to try it on Google Colab Pro+; it may or may not have enough resources. While I have not tested it on a Google Colab Pro+ account, I can confirm that it does NOT run on Google Colab Pro due to insufficient resources.

Google Colab Pro Specs used:

  • NVIDIA A100-SXM4-40GB
  • System RAM = 83.5 GB (I believe this is RAM + GPU)

Here's the code for the Colab ipynb if you wanted to test it out for yourself or if anyone else has access to Pro+.

A free Google Colab will definitely not be sufficient.

@husnoo
Copy link

husnoo commented Apr 12, 2023

I tried with Collab free:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.79 GiB already allocated; 2.81 MiB free; 13.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@orangetin
Copy link
Member

@husnoo yeah, the models are too big to be loaded onto a free account without 8-bit. Can you try this instead? : https://colab.research.google.com/github/orangetin/OpenChatKit/blob/colab-example/inference/example/example.ipynb

@leclem
Copy link
Contributor

leclem commented Apr 25, 2023

@orangetin I can confirm that the inference of togethercomputer/Pythia-Chat-Base-7B works on Google Colab Pro +, but not the togethercomputer/GPT-NeoXT-Chat-Base-20B (this model can get loaded but consumes 39.4 GB of vRAM and thus crashes when an inference is made)

image

and it seems to consume only 14.8 vRAM even with the full version (non 8 bits)

image

The thing that I am now that I'm trying to solve, if anyone has hints about that, is if it is possible to fine-tune the model directly in colab with a single 40gb GPU. It looks like it has been designed specifically for multi-GPU but maybe the code can be tweaked.

@thomasjv799
Copy link

@leclem Is Colab Pro+ worth compare to Colab Pro to finetuning and running inference on LLM models ?

@leclem
Copy link
Contributor

leclem commented Feb 19, 2024

@thomasjv799

No difference for the moment unfortunately

A major hardware limitation for playing with LLMs is the vRAM, the memory of the GPU, on which the model needs to be loaded in order for the GPU to perform operations. Because LLMs are big, they need GPUs with a lot of vRAM.

For the moment, the best GPU you can get on google colab pro or pro+ is an NVIDIA A100 with 40gb vRAM, which is limiting in case of finetuning LLMs. For inference, you can do it if you quantize to 4 bits (or 8 bits for the small ones). Colab pro+ will give you more credits than Colab pro, so more time to use the A100, and prioritary access to it, but will not allow you to have more vRam.

Nevertheless, it looks like supporting 80gb vRam GPUs for colab pro+ is a feature request they have in their roadmap
googlecolab/colabtools#3784
So the answer for the moment is no, but when they add those GPUs it will be yes.

For the moment, I have been using https://www.runpod.io/ which provide an interface similar to colab and supports A100 80gb (a version of A100 with 80gb vRam) as well as the NVIDIA H100 that has 80gb vRam, and it works well.
Also, it supports multiple GPUs attached to a machine, which is a prerequisite for the kind of finetunings done by openchatkit (that finetunes the whole model and not just a part of it) that require more than 80gb of vRAM and are thus distributed over multiple GPUs. https://www.runpod.io/gpu-instance/pricing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants