Skip to content

Commit c23a1c1

Browse files
Add-helium (#35669)
* Add the helium model. * Add a missing helium. * And add another missing helium. * Use float for the rmsnorm mul. * Add the Helium tokenizer converter. * Add the pad token as suggested by Arthur. * Update the RMSNorm + some other tweaks. * Fix more rebase issues. * fix copies and style * fixes and add helium.md * add missing tests * udpate the backlink * oups * style * update init, and expected results * small fixes * match test outputs * style fixup, fix doc builder * add dummies and we should be good to go!z * update sdpa and fa2 documentation --------- Co-authored-by: laurent <[email protected]>
1 parent a3f8232 commit c23a1c1

File tree

17 files changed

+1826
-0
lines changed

17 files changed

+1826
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,8 @@
452452
title: Granite
453453
- local: model_doc/granitemoe
454454
title: GraniteMoe
455+
- local: model_doc/helium
456+
title: Helium
455457
- local: model_doc/herbert
456458
title: HerBERT
457459
- local: model_doc/ibert

docs/source/en/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ Flax), PyTorch, and/or TensorFlow.
173173
| [Graphormer](model_doc/graphormer) ||||
174174
| [Grounding DINO](model_doc/grounding-dino) ||||
175175
| [GroupViT](model_doc/groupvit) ||||
176+
| [Helium](model_doc/helium) ||||
176177
| [HerBERT](model_doc/herbert) ||||
177178
| [Hiera](model_doc/hiera) ||||
178179
| [Hubert](model_doc/hubert) ||||

docs/source/en/model_doc/helium.md

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
<!--Copyright 2024 Kyutai and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
# Helium
18+
19+
20+
## Overview
21+
22+
Helium was proposed in [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.
23+
24+
25+
Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices.
26+
It supports the following languages: English, French, German, Italian, Portuguese, Spanish.
27+
28+
- **Developed by:** Kyutai
29+
- **Model type:** Large Language Model
30+
- **Language(s) (NLP):** English, French, German, Italian, Portuguese, Spanish
31+
- **License:** CC-BY 4.0
32+
33+
34+
35+
36+
## Evaluation
37+
38+
<!-- This section describes the evaluation protocols and provides the results. -->
39+
40+
#### Testing Data
41+
42+
<!-- This should link to a Dataset Card if possible. -->
43+
44+
The model was evaluated on MMLU, TriviaQA, NaturalQuestions, ARC Easy & Challenge, Open Book QA, Common Sense QA,
45+
Physical Interaction QA, Social Interaction QA, HellaSwag, WinoGrande, Multilingual Knowledge QA, FLORES 200.
46+
47+
#### Metrics
48+
49+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
50+
51+
We report accuracy on MMLU, ARC, OBQA, CSQA, PIQA, SIQA, HellaSwag, WinoGrande.
52+
We report exact match on TriviaQA, NQ and MKQA.
53+
We report BLEU on FLORES.
54+
55+
### English Results
56+
57+
| Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
58+
|--------------|--------|--------|--------|--------|--------|
59+
| | | | | | |
60+
| MMLU | 51.2 | 50.4 | 53.1 | 56.6 | 61.0 |
61+
| NQ | 17.3 | 15.1 | 17.7 | 22.0 | 13.1 |
62+
| TQA | 47.9 | 45.4 | 49.9 | 53.6 | 35.9 |
63+
| ARC E | 80.9 | 81.8 | 81.1 | 84.6 | 89.7 |
64+
| ARC C | 62.7 | 64.7 | 66.0 | 69.0 | 77.2 |
65+
| OBQA | 63.8 | 61.4 | 64.6 | 68.4 | 73.8 |
66+
| CSQA | 65.6 | 59.0 | 64.4 | 65.4 | 72.4 |
67+
| PIQA | 77.4 | 77.7 | 79.8 | 78.9 | 76.0 |
68+
| SIQA | 64.4 | 57.5 | 61.9 | 63.8 | 68.7 |
69+
| HS | 69.7 | 73.2 | 74.7 | 76.9 | 67.5 |
70+
| WG | 66.5 | 65.6 | 71.2 | 72.0 | 64.8 |
71+
| | | | | | |
72+
| Average | 60.7 | 59.3 | 62.2 | 64.7 | 63.6 |
73+
74+
#### Multilingual Results
75+
76+
| Language | Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
77+
|-----|--------------|--------|--------|--------|--------|--------|
78+
| | | | | | | |
79+
|German| MMLU | 45.6 | 35.3 | 45.0 | 47.5 | 49.5 |
80+
|| ARC C | 56.7 | 38.4 | 54.7 | 58.3 | 60.2 |
81+
|| HS | 53.5 | 33.9 | 53.4 | 53.7 | 42.8 |
82+
|| MKQA | 16.1 | 7.1 | 18.9 | 20.2 | 10.4 |
83+
| | | | | | | |
84+
|Spanish| MMLU | 46.5 | 38.9 | 46.2 | 49.6 | 52.8 |
85+
|| ARC C | 58.3 | 43.2 | 58.8 | 60.0 | 68.1 |
86+
|| HS | 58.6 | 40.8 | 60.5 | 61.1 | 51.4 |
87+
|| MKQA | 16.0 | 7.9 | 18.5 | 20.6 | 10.6 |
88+
89+
90+
## Technical Specifications
91+
92+
### Model Architecture and Objective
93+
94+
| Hyperparameter | Value |
95+
|--------------|--------|
96+
| Layers | 24 |
97+
| Heads | 20 |
98+
| Model dimension | 2560 |
99+
| MLP dimension | 7040 |
100+
| Context size | 4096 |
101+
| Theta RoPE | 100,000 |
102+
103+
Tips:
104+
105+
- This model was contributed by [Laurent Mazare](https://huggingface.co/lmz)
106+
107+
108+
## Usage tips
109+
110+
`Helium` can be found on the [Huggingface Hub](https://huggingface.co/collections/kyutai/helium-1-preview)
111+
112+
In the following, we demonstrate how to use `helium-1-preview` for the inference.
113+
114+
```python
115+
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
116+
>>> device = "cuda" # the device to load the model onto
117+
118+
>>> model = AutoModelForCausalLM.from_pretrained("helium-1-preview", device_map="auto")
119+
>>> tokenizer = AutoTokenizer.from_pretrained("helium-1-preview")
120+
121+
>>> prompt = "Give me a short introduction to large language model."
122+
123+
>>> messages = [{"role": "user", "content": prompt}]
124+
125+
>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
126+
127+
>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)
128+
129+
>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
130+
131+
>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
132+
133+
>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
134+
```
135+
136+
## HeliumConfig
137+
138+
[[autodoc]] HeliumConfig
139+
140+
## HeliumModel
141+
142+
[[autodoc]] HeliumModel
143+
- forward
144+
145+
## HeliumForCausalLM
146+
147+
[[autodoc]] HeliumForCausalLM
148+
- forward
149+
150+
## HeliumForSequenceClassification
151+
152+
[[autodoc]] HeliumForSequenceClassification
153+
- forward
154+
155+
## HeliumForTokenClassification
156+
157+
[[autodoc]] HeliumForTokenClassification
158+
- forward

docs/source/en/perf_infer_gpu_one.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ FlashAttention-2 is currently supported for the following architectures:
109109
* [SigLIP](https://huggingface.co/docs/transformers/model_doc/siglip)
110110
* [UniSpeech](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/unispeech#transformers.UniSpeechModel)
111111
* [unispeech_sat](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/unispeech-sat#transformers.UniSpeechSatModel)
112+
* [helium](https://huggingface.co/docs/transformers/main/en/model_doc/heliumtransformers.HeliumModel)
112113

113114
You can request to add FlashAttention-2 support for another model by opening a GitHub Issue or Pull Request.
114115

@@ -324,6 +325,7 @@ For now, Transformers supports SDPA inference and training for the following arc
324325
* [XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel)
325326
* [XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl#transformers.XLMRobertaXLModel)
326327
* [YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos#transformers.YolosModel)
328+
* [helium](https://huggingface.co/docs/transformers/main/en/model_doc/heliumtransformers.HeliumModel)
327329

328330
<Tip>
329331

src/transformers/__init__.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,7 @@
498498
"GroupViTTextConfig",
499499
"GroupViTVisionConfig",
500500
],
501+
"models.helium": ["HeliumConfig"],
501502
"models.herbert": ["HerbertTokenizer"],
502503
"models.hiera": ["HieraConfig"],
503504
"models.hubert": ["HubertConfig"],
@@ -2506,6 +2507,15 @@
25062507
"GroupViTVisionModel",
25072508
]
25082509
)
2510+
_import_structure["models.helium"].extend(
2511+
[
2512+
"HeliumForCausalLM",
2513+
"HeliumForSequenceClassification",
2514+
"HeliumForTokenClassification",
2515+
"HeliumModel",
2516+
"HeliumPreTrainedModel",
2517+
]
2518+
)
25092519
_import_structure["models.hiera"].extend(
25102520
[
25112521
"HieraBackbone",
@@ -5529,6 +5539,7 @@
55295539
GroupViTTextConfig,
55305540
GroupViTVisionConfig,
55315541
)
5542+
from .models.helium import HeliumConfig
55325543
from .models.herbert import HerbertTokenizer
55335544
from .models.hiera import HieraConfig
55345545
from .models.hubert import HubertConfig
@@ -7371,6 +7382,13 @@
73717382
GroupViTTextModel,
73727383
GroupViTVisionModel,
73737384
)
7385+
from .models.helium import (
7386+
HeliumForCausalLM,
7387+
HeliumForSequenceClassification,
7388+
HeliumForTokenClassification,
7389+
HeliumModel,
7390+
HeliumPreTrainedModel,
7391+
)
73747392
from .models.hiera import (
73757393
HieraBackbone,
73767394
HieraForImageClassification,

src/transformers/convert_slow_tokenizer.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1446,6 +1446,95 @@ def pre_tokenizer(self, replacement, add_prefix_space):
14461446
return pre_tokenizers.Metaspace(replacement=replacement, prepend_scheme=prepend_scheme, split=False)
14471447

14481448

1449+
class HeliumConverter(SpmConverter):
1450+
handle_byte_fallback = True
1451+
1452+
def __init__(self, vocab_file=None, *args):
1453+
requires_backends(self, "protobuf")
1454+
1455+
Converter.__init__(self, vocab_file)
1456+
1457+
model_pb2 = import_protobuf()
1458+
1459+
m = model_pb2.ModelProto()
1460+
with open(vocab_file, "rb") as f:
1461+
m.ParseFromString(f.read())
1462+
self.proto = m
1463+
1464+
def tokenizer(self, proto):
1465+
vocab_scores = self.vocab(proto)
1466+
tokenizer = Tokenizer(
1467+
Unigram(
1468+
vocab_scores,
1469+
unk_id=self.unk_id(proto),
1470+
byte_fallback=self.handle_byte_fallback,
1471+
)
1472+
)
1473+
# control tokens are special
1474+
# user defined symbols are not
1475+
# both user and control tokens are AddedTokens
1476+
# Add user defined symbols (type == 4) from sentencepiece (https://github.com/google/sentencepiece/blob/6225e08edb2577757163b3f5dbba4c0b670ef445/src/sentencepiece_model.proto#L299C29-L299C33)
1477+
spm_added_tokens = [
1478+
(id, p.piece, p.type == 3 or p.piece in self.special_tokens)
1479+
for id, p in enumerate(proto.pieces)
1480+
if p.type in [3, 4]
1481+
]
1482+
tokenizer.add_tokens(
1483+
[
1484+
AddedToken(token, normalized=False, special=special, single_word=True)
1485+
for id, token, special in sorted(spm_added_tokens, key=lambda x: x[0])
1486+
]
1487+
)
1488+
tokenizer.add_tokens([AddedToken("\n", normalized=False, special=False)])
1489+
tokenizer.enable_padding(pad_token="<pad>", pad_id=3)
1490+
return tokenizer
1491+
1492+
def vocab(self, proto):
1493+
vocab = []
1494+
for piece in proto.pieces:
1495+
if piece.piece == "<0x0A>":
1496+
vocab += [("\n", piece.score)]
1497+
else:
1498+
vocab += [(piece.piece, piece.score)]
1499+
return vocab
1500+
1501+
def unk_id(self, proto):
1502+
unk_id = 0
1503+
return unk_id
1504+
1505+
def decoder(self, replacement, add_prefix_space):
1506+
sequence = [
1507+
decoders.Replace("▁", " "),
1508+
decoders.ByteFallback(),
1509+
decoders.Fuse(),
1510+
]
1511+
sequence += [decoders.Strip(content=" ", left=1)]
1512+
return decoders.Sequence(sequence)
1513+
1514+
def normalizer(self, proto):
1515+
return normalizers.Sequence([normalizers.Prepend(" "), normalizers.Replace(r" ", "▁")])
1516+
1517+
def pre_tokenizer(self, replacement, add_prefix_space):
1518+
return pre_tokenizers.Sequence([pre_tokenizers.Split("\n", "contiguous")])
1519+
1520+
def post_processor(self):
1521+
return processors.TemplateProcessing(
1522+
single=[
1523+
"<s>",
1524+
"$A",
1525+
],
1526+
pair=[
1527+
"<s>",
1528+
"$A",
1529+
"<s>",
1530+
"$B",
1531+
],
1532+
special_tokens=[
1533+
("<s>", 1),
1534+
],
1535+
)
1536+
1537+
14491538
# Copied from transformers.models.gpt2.tokenization_gpt2.bytes_to_unicode
14501539
def bytes_to_unicode():
14511540
"""

src/transformers/models/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@
117117
granitemoe,
118118
grounding_dino,
119119
groupvit,
120+
helium,
120121
herbert,
121122
hiera,
122123
hubert,

src/transformers/models/auto/configuration_auto.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@
137137
("graphormer", "GraphormerConfig"),
138138
("grounding-dino", "GroundingDinoConfig"),
139139
("groupvit", "GroupViTConfig"),
140+
("helium", "HeliumConfig"),
140141
("hiera", "HieraConfig"),
141142
("hubert", "HubertConfig"),
142143
("ibert", "IBertConfig"),
@@ -458,6 +459,7 @@
458459
("graphormer", "Graphormer"),
459460
("grounding-dino", "Grounding DINO"),
460461
("groupvit", "GroupViT"),
462+
("helium", "Helium"),
461463
("herbert", "HerBERT"),
462464
("hiera", "Hiera"),
463465
("hubert", "Hubert"),

src/transformers/models/auto/modeling_auto.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@
132132
("graphormer", "GraphormerModel"),
133133
("grounding-dino", "GroundingDinoModel"),
134134
("groupvit", "GroupViTModel"),
135+
("helium", "HeliumModel"),
135136
("hiera", "HieraModel"),
136137
("hubert", "HubertModel"),
137138
("ibert", "IBertModel"),
@@ -517,6 +518,7 @@
517518
("gptj", "GPTJForCausalLM"),
518519
("granite", "GraniteForCausalLM"),
519520
("granitemoe", "GraniteMoeForCausalLM"),
521+
("helium", "HeliumForCausalLM"),
520522
("jamba", "JambaForCausalLM"),
521523
("jetmoe", "JetMoeForCausalLM"),
522524
("llama", "LlamaForCausalLM"),
@@ -989,6 +991,7 @@
989991
("gpt_neo", "GPTNeoForSequenceClassification"),
990992
("gpt_neox", "GPTNeoXForSequenceClassification"),
991993
("gptj", "GPTJForSequenceClassification"),
994+
("helium", "HeliumForSequenceClassification"),
992995
("ibert", "IBertForSequenceClassification"),
993996
("jamba", "JambaForSequenceClassification"),
994997
("jetmoe", "JetMoeForSequenceClassification"),
@@ -1182,6 +1185,7 @@
11821185
("gpt_bigcode", "GPTBigCodeForTokenClassification"),
11831186
("gpt_neo", "GPTNeoForTokenClassification"),
11841187
("gpt_neox", "GPTNeoXForTokenClassification"),
1188+
("helium", "HeliumForTokenClassification"),
11851189
("ibert", "IBertForTokenClassification"),
11861190
("layoutlm", "LayoutLMForTokenClassification"),
11871191
("layoutlmv2", "LayoutLMv2ForTokenClassification"),

src/transformers/models/auto/tokenization_auto.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,7 @@
226226
("gptsan-japanese", ("GPTSanJapaneseTokenizer", None)),
227227
("grounding-dino", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
228228
("groupvit", ("CLIPTokenizer", "CLIPTokenizerFast" if is_tokenizers_available() else None)),
229+
("helium", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
229230
("herbert", ("HerbertTokenizer", "HerbertTokenizerFast" if is_tokenizers_available() else None)),
230231
("hubert", ("Wav2Vec2CTCTokenizer", None)),
231232
("ibert", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)),

0 commit comments

Comments
 (0)