GQA for smaller models

Hello,

could we please have 13b and 7b models with the updated architecture that includes grouped query attention? A lot of people are running these models on machines with low memory and this would really help them to use a larger context. A context of 4096 just needs too much memory to be feasible right now with good speed and quality on most common hardware.

Thank you!