Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ You can contact us and communicate with us by adding our group:
<img src="asset/discord_qr.jpg" width="200" height="200"> | <img src="asset/wechat.png" width="200" height="200">

## 🎉 News
- 2024.09.24: Support for training and deploying llama3.2 series models. Experience it using `swift infer --model_type llama3_2-1b-instruct`.
- 2024.09.25: Support for training to deployment with got-ocr2. Best practices can be found [here](https://github.com/modelscope/ms-swift/issues/2122).
- 2024.09.24: Support for training and deploying llama3_1-8b-omni. Experience it using `swift infer --model_type llama3_1-8b-omni`.
- 2024.09.23: Support for training and deploying pixtral-12b. Experience it using `swift infer --model_type pixtral-12b --dtype fp16`.
- 🔥2024.09.19: Supports the qwen2.5, qwen2.5-math, and qwen2.5-coder series models. Supports the qwen2-vl-72b series models. Best practices can be found [here](https://github.com/modelscope/ms-swift/issues/2064).
Expand Down Expand Up @@ -583,7 +585,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
| LLaMA3<br>LLaMA3.1 | [LLaMA3 series models](https://github.com/meta-llama/llama3) | English | 8B-70B<br>including quantized versions | base model<br>chat model |
| LLaMA3<br>LLaMA3.1<br>Llama3.2 | [LLaMA3 series models](https://github.com/meta-llama/llama3) | English | 1B-70B<br>including quantized versions | base model<br>chat model |
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B-22B | base model<br>instruct model<br>MoE model |
| Yi<br>Yi1.5<br>Yi-Coder | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 1.5B-34B<br>including quantized | base model<br>chat model<br>long text model |
| InternLM<br>InternLM2<br>InternLM2-Math<br>InternLM2.5 | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
Expand Down
4 changes: 3 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ SWIFT具有丰富全面的文档,请查看我们的文档网站:


## 🎉 新闻
- 2024.09.26: 支持llama3.2系列模型的训练到部署. 使用`swift infer --model_type llama3_2-1b-instruct`进行体验.
- 2024.09.25: 支持got-ocr2的训练到部署. 最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/2122).
- 2024.09.24: 支持llama3_1-8b-omni的训练与部署. 使用`swift infer --model_type llama3_1-8b-omni`进行体验.
- 2024.09.23: 支持pixtral-12b的训练与部署. 使用`swift infer --model_type pixtral-12b --dtype fp16`进行体验.
- 🔥2024.09.19: 支持qwen2.5、qwen2.5-math、qwen2.5-coder系列模型. 支持qwen2-vl-72b系列模型. 最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/2064).
Expand Down Expand Up @@ -576,7 +578,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型<br>MoE模型 | |
| LLaMA2 | [LLaMA2系列模型](https://github.com/facebookresearch/llama) | 英文 | 7B-70B<br>包含量化版本 | base模型<br>chat模型 |
| | LLaMA3<br>LLaMA3.1 | [LLaMA3系列模型](https://github.com/meta-llama/llama3) | 英文 | 8B-70B<br>包含量化版本 | base模型<br>chat模型 |
| LLaMA3<br>LLaMA3.1<br>Llama3.2 | [LLaMA3系列模型](https://github.com/meta-llama/llama3) | 英文 | 1B-70B<br>包含量化版本 | base模型<br>chat模型 |
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B-8x22B | base模型<br>instruct模型<br>MoE模型 |
| Yi<br>Yi1.5<br>Yi-Coder | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 1.5B-34B<br>包含量化版本 | base模型<br>chat模型<br>长文本模型 |
| InternLM<br>InternLM2<br>InternLM2-Math<br>InternLM2.5 | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
Expand Down
4 changes: 4 additions & 0 deletions docs/source/Instruction/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,10 @@
|llama3_1-405b-instruct-awq|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43, autoawq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|
|llama3_1-405b-instruct-gptq-int4|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43, auto_gptq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|
|llama3_1-405b-instruct-bnb|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43, bitsandbytes|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|
|llama3_2-1b|[LLM-Research/Llama-3.2-1B](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)|
|llama3_2-1b-instruct|[LLM-Research/Llama-3.2-1B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|
|llama3_2-3b|[LLM-Research/Llama-3.2-3B](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)|
|llama3_2-3b-instruct|[LLM-Research/Llama-3.2-3B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|
|reflection-llama_3_1-70b|[LLM-Research/Reflection-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Reflection-Llama-3.1-70B/summary)|q_proj, k_proj, v_proj|reflection|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43|-|[mattshumer/Reflection-Llama-3.1-70B](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B)|
|longwriter-glm4-9b|[ZhipuAI/LongWriter-glm4-9b](https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b/summary)|query_key_value|chatglm4|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.42|-|[THUDM/LongWriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b)|
|longwriter-llama3_1-8b|[ZhipuAI/LongWriter-llama3.1-8b](https://modelscope.cn/models/ZhipuAI/LongWriter-llama3.1-8b/summary)|q_proj, k_proj, v_proj|longwriter-llama3|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[THUDM/LongWriter-llama3.1-8b](https://huggingface.co/THUDM/LongWriter-llama3.1-8b)|
Expand Down
4 changes: 4 additions & 0 deletions docs/source_en/Instruction/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,10 @@ The table below introcudes all models supported by SWIFT:
|llama3_1-405b-instruct-awq|[LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43, autoawq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4)|
|llama3_1-405b-instruct-gptq-int4|[LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43, auto_gptq|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4)|
|llama3_1-405b-instruct-bnb|[LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://modelscope.cn/models/LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4/summary)|q_proj, k_proj, v_proj|llama3|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43, bitsandbytes|-|[hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4)|
|llama3_2-1b|[LLM-Research/Llama-3.2-1B](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)|
|llama3_2-1b-instruct|[LLM-Research/Llama-3.2-1B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-1B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)|
|llama3_2-3b|[LLM-Research/Llama-3.2-3B](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)|
|llama3_2-3b-instruct|[LLM-Research/Llama-3.2-3B-Instruct](https://modelscope.cn/models/LLM-Research/Llama-3.2-3B-Instruct/summary)|q_proj, k_proj, v_proj|llama3_2|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)|
|reflection-llama_3_1-70b|[LLM-Research/Reflection-Llama-3.1-70B](https://modelscope.cn/models/LLM-Research/Reflection-Llama-3.1-70B/summary)|q_proj, k_proj, v_proj|reflection|&#x2714;|&#x2714;|&#x2718;|&#x2718;|transformers>=4.43|-|[mattshumer/Reflection-Llama-3.1-70B](https://huggingface.co/mattshumer/Reflection-Llama-3.1-70B)|
|longwriter-glm4-9b|[ZhipuAI/LongWriter-glm4-9b](https://modelscope.cn/models/ZhipuAI/LongWriter-glm4-9b/summary)|query_key_value|chatglm4|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.42|-|[THUDM/LongWriter-glm4-9b](https://huggingface.co/THUDM/LongWriter-glm4-9b)|
|longwriter-llama3_1-8b|[ZhipuAI/LongWriter-llama3.1-8b](https://modelscope.cn/models/ZhipuAI/LongWriter-llama3.1-8b/summary)|q_proj, k_proj, v_proj|longwriter-llama3|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.43|-|[THUDM/LongWriter-llama3.1-8b](https://huggingface.co/THUDM/LongWriter-llama3.1-8b)|
Expand Down
Loading