-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
In the white paper, they mention conditioning to a particular speaker as an input they condition globally, and the TTS component as an up-sampled (deconvolution) conditioned locally. For the latter, they also mention that they tried just repeating the values, but found it worked less well than doing the deconvolutions.
Is there effort underway to implement either of these? Practically speaking, implementing the local conditioning would allow us to begin to have this implementation speak recognizable words.
Zeta36, ibab, nakosung, jyegerlehner, randomrandom and 1 more