🐛 [torch.export][llama2] Accuracy issues with llama model

##  Bug Description

The outputs of TRT compilation do not match with PyTorch for llama2 model. These are the causes for it.

1) Running in FP16 precision (layernorm warns about FP16 precision not being enough). So, we need to compile in FP32 precision

2) Rotation : https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L152-L156 
     This block leads to output mismatch

3) Adding  attention mask https://github.com/huggingface/transformers/blob/e65502951593a76844e872fee9c56b805598538a/src/transformers/models/llama/modeling_llama.py#L347-L349 These lines also cause output mismatch. 

Compiling with dynamic shapes and FP32 also lead to high memory usage. 

## To Reproduce

Steps to reproduce the behavior:

1.
2.
3.



## Expected behavior



## Environment

> Build information about Torch-TensorRT can be found by turning on debug messages

 - Torch-TensorRT Version (e.g. 1.0.0):
 - PyTorch Version (e.g. 1.0):
 - CPU Architecture:
 - OS (e.g., Linux):
 - How you installed PyTorch (`conda`, `pip`, `libtorch`, source):
 - Build command you used (if compiling from source):
 - Are you using local sources or building from archives:
 - Python version:
 - CUDA version:
 - GPU models and configuration:
 - Any other relevant information:

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🐛 [torch.export][llama2] Accuracy issues with llama model #2964

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 [torch.export][llama2] Accuracy issues with llama model #2964

Description

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions