You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@saahiluppal Thank you! Yes, please help us review and discuss in the RFC.
The padding mask, causal_mask mask and segment mask can finally compose an attention mask tensor which is (batch size, ....., target sequence, source sequence). If we put mask composition logic outside the layer and pass the attention mask in, it should be flexible to cover use cases. Please correct me if I am wrong. (Yes, we can move the discussion to the RFC. Thanks!)
while MultiHeadAttention should have to pay attention to two types of attention masks,
While current implementation seems to have only one mask i.e. that masks the padding (1st option).
The text was updated successfully, but these errors were encountered: