Skip to content

Conversation

@Balladie
Copy link

@Balladie Balladie commented Jan 5, 2025

Add options to drop description and text prompt with specified probability in data collator, controlled by the following arguments:

  • p_drop_description: probability of dropping description (which can be an option for better disentanglement between speaker and description)
  • range_cond_drop_description: ratio range of the index up to which the audio codes will not be trained (gives option to prevent initial parts to be trained without description)
  • p_drop_prompt: probability of dropping text prompt (to randomly learn pure unconditioned audio codes)

Not sure if they would work well in all scenarios, but I've noticed some improvement on zero-shot capability with empty description, so I wanted to just open the options to interestingly see how it works for more cases (e.g. applied during pretraining).

Appreciate for the great work! Please let me know if there's any missing or better option (or already has a progress related to this...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant