Dear scholar,
I have doubts about one mathematical symbol in your classical paper "bi-att-flow".
In the paper ,you said the dimension of similar matrix S is T times J.
But in the Query-to-context Attention. Query-to-context (Q2C) on top of page 4, b = softmax(maxcol(S)) , of which the number of column is J and it can't get T. So maybe replace it with maxrow of S.
I am not sure about my doubts. If you have time to reply it, I will be very glad.