random drop description and text prompt #185
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add options to drop description and text prompt with specified probability in data collator, controlled by the following arguments:
p_drop_description: probability of dropping description (which can be an option for better disentanglement between speaker and description)range_cond_drop_description: ratio range of the index up to which the audio codes will not be trained (gives option to prevent initial parts to be trained without description)p_drop_prompt: probability of dropping text prompt (to randomly learn pure unconditioned audio codes)Not sure if they would work well in all scenarios, but I've noticed some improvement on zero-shot capability with empty description, so I wanted to just open the options to interestingly see how it works for more cases (e.g. applied during pretraining).
Appreciate for the great work! Please let me know if there's any missing or better option (or already has a progress related to this...)