Skip to content

Implementation of MultiHeadAttention is incomplete. #1940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
saahiluppal opened this issue Jun 19, 2020 · 3 comments
Closed

Implementation of MultiHeadAttention is incomplete. #1940

saahiluppal opened this issue Jun 19, 2020 · 3 comments

Comments

@saahiluppal
Copy link

saahiluppal commented Jun 19, 2020

while MultiHeadAttention should have to pay attention to two types of attention masks,

  1. That masks the padding (called padding mask)
  2. That prevents positions from attending (called attention mask)

While current implementation seems to have only one mask i.e. that masks the padding (1st option).

@bhack
Copy link
Contributor

bhack commented Jun 19, 2020

Please if you want more features help us to review tensorflow/community#260.

@saberkun
Copy link
Member

@saahiluppal Thank you! Yes, please help us review and discuss in the RFC.
The padding mask, causal_mask mask and segment mask can finally compose an attention mask tensor which is (batch size, ....., target sequence, source sequence). If we put mask composition logic outside the layer and pass the attention mask in, it should be flexible to cover use cases. Please correct me if I am wrong. (Yes, we can move the discussion to the RFC. Thanks!)

@seanpmorgan
Copy link
Member

Closing as this discussion should continue in the new RFC. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants