Commit 9daee2e
use
* us `TokenizersBackend`
* fixes
* pioritize mapping
* pioritize mapping
* only use mapping for some models
* fix fallback
* undo debug thing
* add case to tokenizersbackend init
* add default bos eos token to tok backend
* set bos eos
* fix more models
* mistrla idefics
* fix stopping criteria test
* fix stopping criteria test
* try stopping criteria fix
* rebase
* update tokenizer model for stopping criteria test
* fix tuple mapping for ministral
* ignore `tokenizer_class` as it is always wrong
* up
* try to fix idefics
* fix unispeech and maybe other: fallback if conversion was not possible to the saveclass
* nits
* fixup
* TIL that it was ALSO saved in config.json...
* arf
* fallback to tok config if no config json
* people who map to Llama probably don't even want llama either..
* processors to load tokbackend
* auto fix order
* try diff order
* mistral fix for weird chars
* reorder
* random fix attempt for failing tests that are failing locally so idk how to check these
* trying an older commit
* fix mistral
* map unispeech
* try something out
* update
* nits
* trying to be a little bit more restrictive
* token type ids for tokenizers should be explicits... let's see which test fail this and we'll add to the specific classes?
* Nit
* idefics 1-2 are actually the only ones that should map to llama force
* small fixes
* fix layout
* fixup
* fix some tests
* 1 nit
* aria fix
* style
* canine
* fixup
* very small test
* style
* update to tokenizersbackend
---------
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: itazap <[email protected]>
Co-authored-by: Ita Zaporozhets <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>TokenizersBackend (#42894)1 parent 69ec61f commit 9daee2e
File tree
29 files changed
+249
-243
lines changed- docs/source/en/model_doc
- src/transformers
- integrations
- models
- auto
- blenderbot
- canine
- code_llama
- layoutlmv2
- nougat
- parakeet
- pixtral
- utils
- tests
- generation
- models
- aria
- auto
- chameleon
- chinese_clip
- deepseek_vl_hybrid
- deepseek_vl
- ernie4_5_vl_moe
- granite_speech
- parakeet
- tokenization
29 files changed
+249
-243
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
188 | 188 | | |
189 | 189 | | |
190 | 190 | | |
191 | | - | |
| 191 | + | |
192 | 192 | | |
193 | | - | |
| 193 | + | |
194 | 194 | | |
195 | 195 | | |
196 | 196 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
293 | 293 | | |
294 | 294 | | |
295 | 295 | | |
296 | | - | |
| 296 | + | |
297 | 297 | | |
298 | 298 | | |
299 | 299 | | |
| |||
Large diffs are not rendered by default.
Lines changed: 6 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | 163 | | |
171 | 164 | | |
172 | 165 | | |
| |||
178 | 171 | | |
179 | 172 | | |
180 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
181 | 180 | | |
182 | 181 | | |
183 | 182 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
70 | 72 | | |
71 | 73 | | |
72 | 74 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
161 | | - | |
| 161 | + | |
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
| 163 | + | |
163 | 164 | | |
164 | 165 | | |
165 | 166 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | | - | |
445 | | - | |
446 | | - | |
447 | | - | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
448 | 452 | | |
449 | 453 | | |
450 | 454 | | |
451 | 455 | | |
452 | | - | |
453 | | - | |
| 456 | + | |
| 457 | + | |
454 | 458 | | |
455 | 459 | | |
456 | 460 | | |
457 | 461 | | |
458 | 462 | | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
| 463 | + | |
469 | 464 | | |
470 | 465 | | |
471 | 466 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| |||
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
154 | | - | |
| 154 | + | |
155 | 155 | | |
156 | 156 | | |
157 | 157 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
0 commit comments