[etLLM] Explore the way to load a new LLM to llama_transformer #8231
Labels
module: llm
Issues related to LLM examples and apps, and to the extensions/llm/ code
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
It's related to the first part (eager mode definition of this RFC.
There's llama_transformer that used to be Llama specific. A lot of infra are built around of this model, like source transforms, quantization, export, lowering to each backend, etc.
In open source a lot of models share the same architecture. Can we reuse llama_transformer and all the existing infra, to quickly enable a new model with optimized performance in multiple Edge backends (CPU, NPUs, CoreML, etc.)?
Items to explore:
cc @mergennachin @cccclai @helunwencser @dvorjackz
The text was updated successfully, but these errors were encountered: