Multi Head Attention Layer

I think is a good idea start to think how to implement this sort of layer in Keras.
I know that is a really fresh algorithm, but I believe that's a new cutting edge tech in Deep Learning for the next years.

Paper: **Attention is all you need** (https://arxiv.org/abs/1706.03762)

Blog showing some results: [Google Research Blog](https://research.googleblog.com/2017/08/transformer-novel-neural-network.html?m=1)
Tensor2Tensor library [tensor2tensor](https://github.com/tensorflow/tensor2tensor)
Pytorch implementation [pytorch-t2t](https://github.com/jadore801120/attention-is-all-you-need-pytorch)



 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi Head Attention Layer #7803

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi Head Attention Layer #7803

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions