Skip to content

GQA for smaller models #635

@Dampfinchen

Description

@Dampfinchen

Hello,

could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this would really help them to use a larger context. A context of 4096 just needs too much memory to be feasible right now with good speed and quality on most common hardware.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    new-featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions