Add Darija (Moroccan Arabic) finetuning example#334
Add Darija (Moroccan Arabic) finetuning example#334Mohcinimohamed wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Mohcinimohamed
commented
Apr 11, 2026
- Add darija_toy_dataset/ with 2 Darija speech samples (.wav + .json)
- Update README.md with Darija training and inference commands
- Uses customized_context with Darija-specific hotwords
- Audio recorded by the contributor via a community data collection app
|
@Mohcinimohamed please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
- Add darija_toy_dataset/ with 2 Darija speech samples (.wav + .json) - Update README.md with Darija training and inference commands - Uses customized_context with Darija-specific hotwords - Audio recorded by the contributor via a community data collection app
aa916ef to
ea221c6
Compare
|
Thanks so much for this contribution @Mohcinimohamed! It's really exciting to see community members working on low-resource languages like Darija — this is exactly the kind of work we'd love to highlight. Before we merge, would you be able to share:
This would really help us showcase this as a reference example for fine-tuning on new languages. Thanks again for the great work! |
|
Hi @pengzhiliang ! Thanks so much for the kind words. We're thrilled to contribute to VibeVoice and help expand its capabilities to low-resource languages like Moroccan Darija. Here are the details you requested: 1. LoRA Weights 2. Evaluation Results & Observations
Let me know if you need anything else to get this merged. Thanks again! |
|
This is awesome, thanks so much for the detailed update! The qualitative improvements you described — especially around code-switching — sound really promising. No rush at all on the WER/CER numbers. Let's wait until your full training is done and the final metrics are ready before merging. Feel free to update this PR whenever you're ready! |