Skip to content

is Paged or Flash Attention a default ?  #753

@matthieu-perso

Description

@matthieu-perso

Feature request

Hey all,
The TGI documentation states that PagedAttention and FlashAttention are used. Is there a way to choose which one we use ? Different Attention mechanisms have different pros and cons, and choosing which one to use would be relevant in production.

Motivation

Selecting different attention mechanisms would be relevant for different types of documents. In our case, attention mechanisms that are suited for long sequences would be useful.

Your contribution

Based on the above, happy to contribute code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions