[FEA] Refine training benchmark model configuration.

Currently, our benchmark configuration can not reflect real scenarios well. 
1. The batchsize or sequence length is not typical
2. Input data is uniform, we need powerlaw. (both seqlen and key range)
3. The embedding dim is equal to hidden dim 
4. The attention hyper-params is not well set
5. The model weight is not large enough. (because of 3. and 4.)
7. We have only single node test. 
8. We do not benchmark the TP.
Therefore we need to improve our benchmark suites. 
-----
By submitting this issue, you agree to follow our [code of conduct](https://docs.rapids.ai/resources/conduct/) and our [contributing guidelines](https://github.com/jarmak-nv/rapids-repo-template/blob/main/CONTRIBUTING.md).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Refine training benchmark model configuration. #121

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Refine training benchmark model configuration. #121

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions