meta-llama · giandalia1 · Jul 18, 2023 · Jul 19, 2023 · Jul 20, 2023 · Jul 21, 2023
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,38 @@
+---
+name: Bug report
+about: Create a report to help us reproduce and fix the issue
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the [FAQs](https://ai.meta.com/llama/faq/) and [existing/past issues](https://github.com/facebookresearch/llama/issues)**
+
+## Describe the bug
+<Please provide a clear and concise description of what the bug is. If relevant, please include a _minimal_ (least lines of code necessary) _reproducible_ (running this will give us the same result as you get) code snippet. Make sure to include the relevant imports.>
+
+### Minimal reproducible example
+<Remember to wrap the code in ```` ```triple-quotes blocks``` ````>
+
+```python
+# sample code to repro the bug
+```
+
+### Output
+<Remember to wrap the output in ```` ```triple-quotes blocks``` ````>
+
+```
+<paste stacktrace and other outputs here>
+```
+
+## Runtime Environment
+- Model: [eg: `llama-2-7b-chat`]
+- Using via huggingface?: [yes/no]
+- OS: [eg. Linux/Ubuntu, Windows]
+- GPU VRAM: 
+- Number of GPUs:
+- GPU Make: [eg: Nvidia, AMD, Intel]
+
+**Additional context**
+Add any other context about the problem or environment here.
diff --git a/.github/workflows/django.yml b/.github/workflows/django.yml
@@ -0,0 +1,30 @@
+name: Django CI
+
+on:
+  push:
+    branches: [ "main" ]
+  pull_request:
+    branches: [ "main" ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+    strategy:
+      max-parallel: 4
+      matrix:
+        python-version: [3.7, 3.8, 3.9]
+
+    steps:
+    - uses: actions/checkout@v4
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v3
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install Dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -r requirements.txt
+    - name: Run Tests
+      run: |
+        python manage.py test
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -3,7 +3,9 @@ We want to make contributing to this project as easy and transparent as
 possible.
 
 ## Pull Requests
-We actively welcome your pull requests.
+We welcome your pull requests.
+
+### For requests regarding bug-fixes or improvements to the core model:
 
 1. Fork the repo and create your branch from `main`.
 2. If you've added code that should be tested, add tests.
@@ -12,6 +14,10 @@ We actively welcome your pull requests.
 5. Make sure your code lints.
 6. If you haven't already, complete the Contributor License Agreement ("CLA").
 
+### For requests regarding new feature support, adding additional platform support and model use cases, please contribute to the [llama-recipes repo](https://github.com/facebookresearch/llama-recipes).
+<br><br>
+
+
 ## Contributor License Agreement ("CLA")
 In order to accept your pull request, we need you to submit a CLA. You only need
 to do this once to work on any of Meta's open source projects.

diff --git a/LICENSE b/LICENSE
@@ -104,7 +104,7 @@ owner of such derivative works and modifications.
       c. If you institute litigation or other proceedings against Meta or any entity 
 (including a cross-claim or counterclaim in a lawsuit) alleging that the Llama 
 Materials or Llama 2 outputs or results, or any portion of any of the foregoing, 
-constitutes infringement of intellectual property or other rights owned or licensable 
+constitutes an infringement of intellectual property or other rights owned or licensable 
 by you, then any licenses granted to you under this Agreement shall terminate as of 
 the date such litigation or claim is filed or instituted. You will indemnify and hold 
 harmless Meta from and against any claim by any third party arising out of or related 

diff --git a/MODEL_CARD.md b/MODEL_CARD.md
@@ -10,9 +10,9 @@ Meta developed and released the Llama 2 family of large language models (LLMs),
 
 **Output** Models generate text only.
 
-**Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
+**Model Architecture** Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
 
-||Training Data|Params|Content Length|GQA|Tokens|LR|
+||Training Data|Params|Context Length|GQA|Tokens|LR|
 |---|---|---|---|---|---|---|
 Llama 2|*A new mix of publicly available online data*|7B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>
 Llama 2|*A new mix of publicly available online data*|13B|4k|&#10007;|2.0T|3.0 x 10<sup>-4</sup>
@@ -33,7 +33,9 @@ Llama 2|*A new mix of publicly available online data*|70B|4k|&#10004;|2.0T|1.5 x
 # **Intended Use**
 **Intended Use Cases** Llama 2 is intended for commercial and research use in English. Tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
 
-**Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English. Use in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for Llama 2.
+**Out-of-scope Uses** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 2 Community License. Use in languages other than English**. 
+
+**Note: Developers may fine-tune Llama 2 models for languages beyond English provided they comply with the Llama 2 Community License and the Acceptable Use Policy.
 
 # **Hardware and Software**
 **Training Factors** We used custom training libraries, Meta's Research Super Cluster, and production clusters for pretraining. Fine-tuning, annotation, and evaluation were also performed on third-party cloud compute.
@@ -69,7 +71,7 @@ For all the evaluations, we use our internal evaluations library.
 |Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
 |Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|
 
-**Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at top 1.
+**Overall performance on grouped academic benchmarks.** *Code:* We report the average pass@1 scores of our models on HumanEval and MBPP. *Commonsense Reasoning:* We report the average of PIQA, SIQA, HellaSwag, WinoGrande, ARC easy and challenge, OpenBookQA, and CommonsenseQA. We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks. *World Knowledge:* We evaluate the 5-shot performance on NaturalQuestions and TriviaQA and report the average. *Reading Comprehension:* For reading comprehension, we report the 0-shot average on SQuAD, QuAC, and BoolQ. *MATH:* We report the average of the GSM8K (8 shot) and MATH (4 shot) benchmarks at the top 1.
 
 |||TruthfulQA|Toxigen|
 |---|---|---|---|

diff --git a/README.md b/README.md
@@ -1,44 +1,74 @@
-# Llama 2
+    ## **Note of deprecation**
 
-We are unlocking the power of large language models. Our latest version of Llama is now accessible to individuals, creators, researchers and businesses of all sizes so that they can experiment, innovate and scale their ideas responsibly. 
+Thank you for developing with Llama models. As part of the Llama 3.1 release, we’ve consolidated GitHub repos and added some additional repos as we’ve expanded Llama’s functionality into being an e2e Llama Stack. Please use the following repos going forward:
+- [llama-models](https://github.com/meta-llama/llama-models) - Central repo for the foundation models including basic utilities, model cards, license and use policies
+- [PurpleLlama](https://github.com/meta-llama/PurpleLlama) - Key component of Llama Stack focusing on safety risks and inference time mitigations 
+- [llama-toolchain](https://github.com/meta-llama/llama-toolchain) - Model development (inference/fine-tuning/safety shields/synthetic data generation) interfaces and canonical implementations
+- [llama-agentic-system](https://github.com/meta-llama/llama-agentic-system) - E2E standalone Llama Stack system, along with opinionated underlying interface, that enables creation of agentic applications
+- [llama-cookbook](https://github.com/meta-llama/llama-recipes) - Community driven scripts and integrations
 
-This release includes model weights and starting code for pretrained and fine-tuned Llama language models — ranging from 7B to 70B parameters.
+If you have any questions, please feel free to file an issue on any of the above repos and we will do our best to respond in a timely manner. 
 
-This repository is intended as a minimal example to load [Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) models and run inference. For more detailed examples leveraging HuggingFace, see [llama-recipes](https://github.com/facebookresearch/llama-recipes/).
+Thank you!
 
-## System Prompt Update
 
-### Observed Issue
-We received feedback from the community on our prompt template and we are providing an update to reduce the false refusal rates seen. False refusals occur when the model incorrectly refuses to answer a question that it should, for example due to overly broad instructions to be cautious in how it provides responses. 
+# (Deprecated) Llama 2
 
-### Updated approach
-Based on evaluation and analysis, we recommend the removal of the system prompt as the default setting.  Pull request [#626](https://github.com/facebookresearch/llama/pull/626) removes the system prompt as the default option, but still provides an example to help enable experimentation for those using it. 
+We are unlocking the power of large language models. Llama 2 is now accessible to individuals, creators, researchers, and businesses of all sizes so that they can experiment, innovate, and scale their ideas responsibly. 
 
-## Download
+This release includes model weights and starting code for pre-trained and fine-tuned Llama language models — ranging from 7B to 70B parameters.
+
+This repository is intended as a minimal example to load [Llama 2](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) models and run inference. For more detailed examples leveraging Hugging Face, see [llama-cookbook](https://github.com/facebookresearch/llama-recipes/).
 
-⚠️ **7/18: We're aware of people encountering a number of download issues today. Anyone still encountering issues should remove all local files, re-clone the repository, and [request a new download link](https://ai.meta.com/resources/models-and-libraries/llama-downloads/). It's critical to do all of these in case you have local corrupt files. When you receive the email, copy *only* the link text - it should begin with https://download.llamameta.net and not with https://l.facebook.com, which will give errors.**
+## Updates post-launch
 
+See [UPDATES.md](UPDATES.md). Also for a running list of frequently asked questions, see [here](https://ai.meta.com/llama/faq/).
 
+## Download
 
-In order to download the model weights and tokenizer, please visit the [Meta AI website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.
+In order to download the model weights and tokenizer, please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and accept our License.
 
-Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download. Make sure that you copy the URL text itself, **do not use the 'Copy link address' option** when you right click the URL. If the copied URL text starts with: https://download.llamameta.net, you copied it correctly. If the copied URL text starts with: https://l.facebook.com, you copied it the wrong way.
+Once your request is approved, you will receive a signed URL over email. Then run the download.sh script, passing the URL provided when prompted to start the download.
 
-Pre-requisites: make sure you have `wget` and `md5sum` installed. Then to run the script: `./download.sh`.
+Pre-requisites: Make sure you have `wget` and `md5sum` installed. Then run the script: `./download.sh`.
 
 Keep in mind that the links expire after 24 hours and a certain amount of downloads. If you start seeing errors such as `403: Forbidden`, you can always re-request a link.
 
-### Access on Hugging Face
+### Access to Hugging Face
 
-We are also providing downloads on [Hugging Face](https://huggingface.co/meta-llama). You must first request a download from the Meta AI website using the same email address as your Hugging Face account. After doing so, you can request access to any of the models on Hugging Face and within 1-2 days your account will be granted access to all versions.
+We are also providing downloads on [Hugging Face](https://huggingface.co/meta-llama). You can request access to the models by acknowledging the license and filling in the form in the model card of a repo. After doing so, you should get access to all the Llama models of a version (Code Llama, Llama 2, or Llama Guard) within 1 hour.
 
-## Setup
+## Quick Start
 
-In a conda env with PyTorch / CUDA available, clone the repo and run in the top-level directory:
+You can follow the steps below to quickly get up and running with Llama 2 models. These steps will let you run quick inference locally. For more examples, see the [Llama 2 cookbook repository](https://github.com/facebookresearch/llama-recipes). 
 
+1. In a conda env with PyTorch / CUDA available clone and download this repository.
+
+2. In the top-level directory run:
+    ```bash
+    pip install -e .
+    ```
+3. Visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) and register to download the model/s.
+
+4. Once registered, you will get an email with a URL to download the models. You will need this URL when you run the download.sh script.
+
+5. Once you get the email, navigate to your downloaded llama repository and run the download.sh script. 
+    - Make sure to grant execution permissions to the download.sh script
+    - During this process, you will be prompted to enter the URL from the email. 
+    - Do not use the “Copy Link” option but rather make sure to manually copy the link from the email.
+
+6. Once the model/s you want have been downloaded, you can run the model locally using the command below:
+```bash
+torchrun --nproc_per_node 1 example_chat_completion.py \
+    --ckpt_dir llama-2-7b-chat/ \
+    --tokenizer_path tokenizer.model \
+    --max_seq_len 512 --max_batch_size 6
 ```
-pip install -e .
-```
+**Note**
+- Replace  `llama-2-7b-chat/` with the path to your checkpoint directory and `tokenizer.model` with the path to your tokenizer model.
+- The `–nproc_per_node` should be set to the [MP](#inference) value for the model you are using.
+- Adjust the `max_seq_len` and `max_batch_size` parameters as needed.
+- This example runs the [example_chat_completion.py](example_chat_completion.py) found in this repository but you can change that to a different .py file.
 
 ## Inference
 
@@ -56,7 +86,7 @@ All models support sequence length up to 4096 tokens, but we pre-allocate the ca
 
 These models are not finetuned for chat or Q&A. They should be prompted so that the expected answer is the natural continuation of the prompt.
 
-See `example_text_completion.py` for some examples. To illustrate, see command below to run it with the llama-2-7b model (`nproc_per_node` needs to be set to the `MP` value):
+See `example_text_completion.py` for some examples. To illustrate, see the command below to run it with the llama-2-7b model (`nproc_per_node` needs to be set to the `MP` value):
 
 ```
 torchrun --nproc_per_node 1 example_text_completion.py \
@@ -70,23 +100,23 @@ torchrun --nproc_per_node 1 example_text_completion.py \
 The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in [`chat_completion`](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L212)
 needs to be followed, including the `INST` and `<<SYS>>` tags, `BOS` and `EOS` tokens, and the whitespaces and breaklines in between (we recommend calling `strip()` on inputs to avoid double-spaces).
 
-You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-recipes repo for [an example](https://github.com/facebookresearch/llama-recipes/blob/main/inference/inference.py) of how to add a safety checker to the inputs and outputs of your inference code.
+You can also deploy additional classifiers for filtering out inputs and outputs that are deemed unsafe. See the llama-cookbook repo for [an example](https://github.com/facebookresearch/llama-recipes/blob/main/examples/inference.py) of how to add a safety checker to the inputs and outputs of your inference code.
 
 Examples using llama-2-7b-chat:
 
 ```
 torchrun --nproc_per_node 1 example_chat_completion.py \
     --ckpt_dir llama-2-7b-chat/ \
     --tokenizer_path tokenizer.model \
-    --max_seq_len 512 --max_batch_size 4
+    --max_seq_len 512 --max_batch_size 6
 ```
 
 Llama 2 is a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios.
 In order to help developers address these risks, we have created the [Responsible Use Guide](Responsible-Use-Guide.pdf). More details can be found in our research paper as well.
 
 ## Issues
 
-Please report any software “bug,” or other problems with the models through one of the following means:
+Please report any software “bug”, or other problems with the models through one of the following means:
 - Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama)
 - Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
 - Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)
@@ -106,5 +136,7 @@ See the [LICENSE](LICENSE) file, as well as our accompanying [Acceptable Use Pol
 2. [Llama 2 technical overview](https://ai.meta.com/resources/models-and-libraries/llama)
 3. [Open Innovation AI Research Community](https://ai.meta.com/llama/open-innovation-ai-research-community/)
 
-## Original LLaMA
+For common questions, the FAQ can be found [here](https://ai.meta.com/llama/faq/) which will be kept up to date over time as new questions arise. 
+
+    ## Original Llama8=(
 The repo for the original llama release is in the [`llama_v1`](https://github.com/facebookresearch/llama/tree/llama_v1) branch.