-
-
Notifications
You must be signed in to change notification settings - Fork 10
Closed
Description
System Info
transformers version: 4.34.0
Platform: linux
Python version: 3.9.18
sudachitra version: 0.1.8
sudachipy version: 0.6.7
sudachi-core version:20230927
Upstream changes in transformers due to PR: huggingface/transformers#23909 causes error when
running the example over at: https://huggingface.co/megagonlabs/transformers-ud-japanese-electra-base-discriminator
this happens for other custom tokenizers as well: huggingface/transformers#26777
from sudachitra import ElectraSudachipyTokenizer
tokenizer = ElectraSudachipyTokenizer.from_pretrained("megagonlabs/transformers-ud-japanese-electra-base-discriminator")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lib64/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2045, in from_pretrained
return cls._from_pretrained(
File "/home/lib64/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/lib64/python3.9/site-packages/sudachitra/tokenization_bert_sudachipy.py", line 155, in __init__
super().__init__(
File "/home/lib64/python3.9/site-packages/transformers/tokenization_utils.py", line 366, in __init__
self._add_tokens(self.all_special_tokens_extended, special_tokens=True)
File "/home/lib64/python3.9/site-packages/transformers/tokenization_utils.py", line 462, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/lib64/python3.9/site-packages/sudachitra/tokenization_bert_sudachipy.py", line 218, in get_vocab
return dict(self.vocab, **self.added_tokens_encoder)
AttributeError: 'ElectraSudachipyTokenizer' object has no attribute 'vocab'If it is ok - I would like to contribute and submit a PR to fix for this issue.
Metadata
Metadata
Assignees
Labels
No labels