Skip to content

How do I pre-train the T5 model in HuggingFace library using my own text corpus? #5079

@abhisheknovoic

Description

@abhisheknovoic

Hello,

I understand how the T5 architecture works and I have my own large corpus where I decide to mask a sequence of tokens and replace them with sentinel tokens.

I also understand about the tokenizers in HuggingFace, specially the T5 tokenizer.

Can someone point me to a document or refer me to the class that I need to use to pretrain T5 model on my corpus using the masked language model approach?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions