You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><a href="#obtaining-the-facebook-llama-original-model-and-stanford-alpaca-model-data">Obtaining the Facebook LLaMA original model and Stanford Alpaca model data</a></li>
46
-
<li><a href="#verifying-the-model-files">Verifying the model files</a></li>
Here are the end-to-end binary build and model conversion steps for the LLaMA-7B model.
246
+
Here are the end-to-end binary build and model conversion steps for most supported models.
254
247
255
248
### Get the Code
256
249
@@ -635,7 +628,7 @@ Building the program with BLAS support may lead to some performance improvements
635
628
636
629
**Without docker**:
637
630
638
-
Firstly, you need to make sure you installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
631
+
Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html)
639
632
640
633
For example, on Ubuntu 22.04 (jammy), use the command below:
641
634
@@ -648,6 +641,8 @@ Building the program with BLAS support may lead to some performance improvements
648
641
vulkaninfo
649
642
```
650
643
644
+
Alternatively your package manager might be able to provide the appropiate libraries. For example for Ubuntu 22.04 you can install `libvulkan-dev` instead.
645
+
651
646
Then, build llama.cpp using the cmake command below:
652
647
653
648
```bash
@@ -662,34 +657,42 @@ Building the program with BLAS support may lead to some performance improvements
To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.
666
663
667
664
```bash
668
-
# obtain the original LLaMA model weights and place them in ./models
665
+
# obtain the official LLaMA model weights and place them in ./models
When running the larger models, make sure you have enough disk space to store all the intermediate files.
@@ -710,7 +713,7 @@ From the unzipped folder, open a terminal/cmd window here and place a pre-conver
710
713
711
714
As the models are currently fully loaded into memory, you will need adequate disk space to save them and sufficient RAM to load them. At the moment, memory and disk requirements are the same.
712
715
713
-
| Model | Original size | Quantized size (4-bit) |
@@ -826,9 +829,9 @@ The `grammars/` folder contains a handful of sample grammars. To write your own,
826
829
827
830
For authoring more complex JSON grammars, you can also check out https://grammar.intrinsiclabs.ai/, a browser app that lets you write TypeScript interfaces which it compiles to GBNF grammars that you can save forlocal use. Note that the app is built and maintained by members of the community, please file any issues or FRs on [its repo](http://github.com/intrinsiclabsai/gbnfgen) and not this one.
828
831
829
-
### Instruction mode with Alpaca
832
+
### Instruct mode
830
833
831
-
1. First, download the `ggml` Alpaca model into the `./models` folder
834
+
1. First, download and place the `ggml` model into the `./models` folder
### Using [OpenLLaMA](https://github.com/openlm-research/open_llama)
858
-
859
-
OpenLLaMA is an openly licensed reproduction of Meta's original LLaMA model. It uses the same architecture and is a drop-in replacement for the original LLaMA weights.
860
-
861
-
-Download the [3B](https://huggingface.co/openlm-research/open_llama_3b), [7B](https://huggingface.co/openlm-research/open_llama_7b), or [13B](https://huggingface.co/openlm-research/open_llama_13b) model from Hugging Face.
862
-
-Convert the model to ggml FP16 format using `python convert.py <path to OpenLLaMA directory>`
863
-
864
-
### Using [GPT4All](https://github.com/nomic-ai/gpt4all)
865
-
866
-
*Note: these instructions are likely obsoleted by the GGUF update*
867
-
868
-
-Obtain the `tokenizer.model` file from LLaMA model and put it to `models`
869
-
-Obtain the `added_tokens.json` file from Alpaca model and put it to `models`
870
-
-Obtain the `gpt4all-lora-quantized.bin` file from GPT4All model and put it to `models/gpt4all-7B`
871
-
-It is distributed in the old `ggml` format which is now obsoleted
872
-
-You have to convert it to the new format using `convert.py`:
- You can now use the newly generated `models/gpt4all-7B/ggml-model-q4_0.bin` model in exactly the same way as all other models
879
-
880
-
- The newer GPT4All-J model is not yet supported!
881
-
882
-
### Using Pygmalion 7B & Metharme 7B
883
-
884
-
- Obtain the [LLaMA weights](#obtaining-the-facebook-llama-original-model-and-stanford-alpaca-model-data)
885
-
-Obtain the [Pygmalion 7B](https://huggingface.co/PygmalionAI/pygmalion-7b/) or [Metharme 7B](https://huggingface.co/PygmalionAI/metharme-7b) XOR encoded weights
886
-
-Convert the LLaMA model with [the latest HF convert script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/convert_llama_weights_to_hf.py)
887
-
-Merge the XOR files with the converted LLaMA weights by running the [xor_codec](https://huggingface.co/PygmalionAI/pygmalion-7b/blob/main/xor_codec.py) script
888
-
-Convert to `ggml` format using the `convert.py` script in this repo:
889
-
```bash
890
-
python3 convert.py pygmalion-7b/--outtype q4_1
891
-
```
892
-
>ThePygmalion 7B &Metharme 7B weights are saved in [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) precision. If you wish to convert to `ggml` without quantizating, please specify the `--outtype` as `f32` instead of `f16`.
893
-
894
-
895
-
### Obtaining the FacebookLLaMA original model and StanfordAlpaca model data
896
-
897
-
-**Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.**
898
-
-TheLLaMA models are officially distributed by Facebook and will **never** be provided through this repository.
899
-
-Refer to [Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to request access to the model data.
900
-
901
860
### Obtaining and using the Facebook LLaMA 2 model
902
861
903
862
- Refer to [Facebook's LLaMA download page](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) if you want to access the model data.
Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files.
915
-
-The following python script will verify if you have all possible latest files in your self-installed `./models` subdirectory:
916
-
917
-
```bash
918
-
# run the verification script
919
-
./scripts/verify-checksum-models.py
920
-
```
921
-
922
-
-On linux or macOS it is also possible to run the following commands to verify if you have all possible latest files in your self-installed `./models` subdirectory:
- on macOS: `shasum -a 256--ignore-missing -c SHA256SUMS`
925
-
926
871
### Seminal papers and background on the models
927
872
928
873
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
0 commit comments