-
Notifications
You must be signed in to change notification settings - Fork 1k
Is it possible to run the system on Google Colab ? #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 |
Google Colab Pro offers A100 40GB with 40GB RAM when using high-ram IIRC. You'd need to try it on Google Colab Pro+; it may or may not have enough resources. While I have not tested it on a Google Colab Pro+ account, I can confirm that it does NOT run on Google Colab Pro due to insufficient resources. Google Colab Pro Specs used:
Here's the code for the Colab ipynb if you wanted to test it out for yourself or if anyone else has access to Pro+. A free Google Colab will definitely not be sufficient. |
I tried with Collab free:
|
@husnoo yeah, the models are too big to be loaded onto a free account without 8-bit. Can you try this instead? : https://colab.research.google.com/github/orangetin/OpenChatKit/blob/colab-example/inference/example/example.ipynb |
@orangetin I can confirm that the inference of togethercomputer/Pythia-Chat-Base-7B works on Google Colab Pro +, but not the togethercomputer/GPT-NeoXT-Chat-Base-20B (this model can get loaded but consumes 39.4 GB of vRAM and thus crashes when an inference is made) and it seems to consume only 14.8 vRAM even with the full version (non 8 bits) The thing that I am now that I'm trying to solve, if anyone has hints about that, is if it is possible to fine-tune the model directly in colab with a single 40gb GPU. It looks like it has been designed specifically for multi-GPU but maybe the code can be tweaked. |
@leclem Is Colab Pro+ worth compare to Colab Pro to finetuning and running inference on LLM models ? |
No difference for the moment unfortunately A major hardware limitation for playing with LLMs is the vRAM, the memory of the GPU, on which the model needs to be loaded in order for the GPU to perform operations. Because LLMs are big, they need GPUs with a lot of vRAM. For the moment, the best GPU you can get on google colab pro or pro+ is an NVIDIA A100 with 40gb vRAM, which is limiting in case of finetuning LLMs. For inference, you can do it if you quantize to 4 bits (or 8 bits for the small ones). Colab pro+ will give you more credits than Colab pro, so more time to use the A100, and prioritary access to it, but will not allow you to have more vRam. Nevertheless, it looks like supporting 80gb vRam GPUs for colab pro+ is a feature request they have in their roadmap For the moment, I have been using https://www.runpod.io/ which provide an interface similar to colab and supports A100 80gb (a version of A100 with 80gb vRam) as well as the NVIDIA H100 that has 80gb vRam, and it works well. |
Is it possible to reduce the amount of resources needed to run the system on Google Colab ?
because not everyone has the means to experiment with A100 80gb
The text was updated successfully, but these errors were encountered: