This repository serves as a collection of Myanmar language datasets, focusing on both speech and text resources. Given the scarcity and difficulty in finding Myanmar language datasets, our goal is to create a centralized reference point for researchers, developers, and language enthusiasts. As Myanmar language resources are often challenging to locate, we encourage contributions from the community.
If you know of or have access to additional Myanmar language datasets not listed here, please consider contributing by submitting a pull request or opening an issue. Let's collaborate to build a comprehensive inventory of Myanmar language resources.
-
Myanmar Speech Dataset for ASR
- This is a collection of available Myanmar speech datasets for training ASR models.
- Datasets in this collection:
- OpenSLR (See No.2)
- Google Fleurs (See No.4)
- HuggingFace Dataset
-
Crowdsourced high-quality Burmese speech dataset (SLR80)
- Download Page
- Download Link
- HuggingFace Original Dataset
- HuggingFace Myanmar Language Only Dataset
- Notebook (Train/Test splitting and uploading to huggingface)
-
BloomSpeech
- HuggingFace Dataset
- Notebook (Loading Myanmar Language)
- Notes: Although it's showing burmese, the actual
language='mya'
is Palaung (De'ang / Ta'ang / Riang) language.
-
Google Fleurs
- HuggingFace Original Dataset
- HuggingFace Myanmar Language Only Dataset
- Notebook (Loading Myanmar language and uploading to huggingface)
- Asian Language Treebank (ALT)
- Download Page
- HuggingFace Dataset
- It supports translation between following languages:
- Myanmar (Burmese) To Bengali
- Myanmar (Burmese) To English
- Myanmar (Burmese) To Filipino
- Myanmar (Burmese) To Hindi
- Myanmar (Burmese) To Bahasa Indonesia
- Myanmar (Burmese) To Japanese
- Myanmar (Burmese) To Khmer
- Myanmar (Burmese) To Lao
- Myanmar (Burmese) To Malay
- Myanmar (Burmese) To Thai
- Myanmar (Burmese) To Vietnamese
- Myanmar (Burmese) To Chinese (Simplified Chinese).
- A Corpus of Modern Burmese
- Download Page
- You can download it directly from the current repo
- Myanmar Spoken and Written Language Dataset
- Myanmar NRC Format Dataset
- Myanmar Wikipedia Dataset
- Officail wikimedia/wikipedia Repo - HuggingFace Dataset (subset: 20231101.my)
- Alternative Repo with category paths
- HuggingFace Dataset
- Github Repo with web crawler scripts/notebooks
- Myanmar Book Corpus Dataset (MM-Lib)
- Myanmar C4 Dataset (Converted Zawgyi to Unicode)
- Official C4 Repo - HuggingFace Dataset
- Myanmar Unicode C4 Repo
- Myanmar CulturaX Dataset (Converted Zawgyi to Unicode)
- Official CulturaX Repo - HuggingFace Dataset
- Myanmar Unicode CulturaX Repo
- Myanmar CC100 Dataset (Converted Zawgyi to Unicode)
- Official CC100 Repo - HuggingFace Dataset
- Myanmar Unicode CC100 Repo
- ChannelMyanmar Movie Summary Dataset
- Myanmar Fineweb2 Dataset (Converted Zawgyi to Unicode)
- Official Fineweb2 Repo - HuggingFace Dataset
- Myanmar Unicode Fineweb2 Repo
- Myanmar Dhamma Article Dataset (Converted Zawgyi to Unicode)
- HuggingFace Dataset
- Notebook (Scraping notebook)
- Myanmar Dhamma Question and Answer Dataset
- HuggingFace Dataset
- Notebook (Genrating Q&A with Gemma 3)
- Myanmar Aya Dataset
- Official Aya Repo - HuggingFace Dataset
- Myanmar Aya Repo
- Burmese Microbiology 1K
- Mpox Myanmar
- Myanmar Agriculture 1K
- Myanmar Instruction Tuning Dataset
- This is a collection of available Myanmar Question and Answer datasets for instruction fine-tuning LLM models.
- Datasets in this collection:
- Burmese Microbiology 1K (See No.15)
- Mpox Myanmar (See No.16)
- Myanmar Agriculture 1K (See No.17)
- Myanmar Aya Dataset (See No.14)
- Myanmar Dhamma Question and Answer Dataset (See No.13)
- Myanmar Football Dataset (See No.21)
- HuggingFace Dataset
- Dataset Generting Notebook
- Myanmar Social Media Sentiment Analysis Dataset
- Original Social Media Sentiments Analysis Dataset - Kaggle Dataset
- Myanmar Translated Dataset
- myXNLI - Myanmar Natural Language Inference Corpus
- Myanmar Football Dataset
- Myanmar Facebook Flores Dataset
- Official Flores Repo - HuggingFace Dataset
- Myanmar Facebook Flores Repo