Currently, our benchmark configuration can not reflect real scenarios well.
- The batchsize or sequence length is not typical
- Input data is uniform, we need powerlaw. (both seqlen and key range)
- The embedding dim is equal to hidden dim
- The attention hyper-params is not well set
- The model weight is not large enough. (because of 3. and 4.)
- We have only single node test.
- We do not benchmark the TP.
Therefore we need to improve our benchmark suites.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.
Currently, our benchmark configuration can not reflect real scenarios well.
Therefore we need to improve our benchmark suites.
By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.