Feature request
Hey all,
The TGI documentation states that PagedAttention and FlashAttention are used. Is there a way to choose which one we use ? Different Attention mechanisms have different pros and cons, and choosing which one to use would be relevant in production.
Motivation
Selecting different attention mechanisms would be relevant for different types of documents. In our case, attention mechanisms that are suited for long sequences would be useful.
Your contribution
Based on the above, happy to contribute code