Feature request
Saw a blog post where together.ai is advertising 3x inference performance via their API, I'm sure there are some optimization techniques they are using this repo can benefit from
https://www.together.ai/blog/together-inference-engine-v1
Motivation
Faster inference!
Your contribution
Happy to help if there is overlap with my skillset