Hello,
I understand how the T5 architecture works and I have my own large corpus where I decide to mask a sequence of tokens and replace them with sentinel tokens.
I also understand about the tokenizers in HuggingFace, specially the T5 tokenizer.
Can someone point me to a document or refer me to the class that I need to use to pretrain T5 model on my corpus using the masked language model approach?
Thanks
Hello,
I understand how the T5 architecture works and I have my own large corpus where I decide to mask a sequence of tokens and replace them with sentinel tokens.
I also understand about the tokenizers in HuggingFace, specially the T5 tokenizer.
Can someone point me to a document or refer me to the class that I need to use to pretrain T5 model on my corpus using the masked language model approach?
Thanks