Skip to content

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iseeyuan opened this issue Feb 5, 2025 · 1 comment
Open

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

iseeyuan opened this issue Feb 5, 2025 · 1 comment
Assignees
Labels
module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@iseeyuan
Copy link
Contributor

iseeyuan commented Feb 5, 2025

It's related to the first part (eager mode definition of this RFC.

There's llama_transformer that used to be Llama specific. A lot of infra are built around of this model, like source transforms, quantization, export, lowering to each backend, etc.

In open source a lot of models share the same architecture. Can we reuse llama_transformer and all the existing infra, to quickly enable a new model with optimized performance in multiple Edge backends (CPU, NPUs, CoreML, etc.)?

Items to explore:

  • Utils to quickly convert different checkpoints to Llama compliant. An example is to leverage the torchtune utils and convert a Hugging Face checkpoint.
  • Build the model quickly with proper configs.
  • Any interfaces are missing to streamline this flow?
  • Hide the technical details (users may not understand operators or a delegate partitioner), and expose necessary APIs for user to config, to meet their KPIs. For example, based on the accuracy and memory, provide supported quantization bit width for users to config.
  • Quick return on results (perplexity, performance numbers, etc.)

cc @mergennachin @cccclai @helunwencser @dvorjackz

@iseeyuan iseeyuan added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code labels Feb 5, 2025
@iseeyuan
Copy link
Contributor Author

iseeyuan commented Feb 6, 2025

Chatted with @sxu offline. Some additional points to consider:

  • scalar attributes: can be included in the map?
  • no submodule class check. Is it the assumption of the subclass type, especially when the model definition is released and fixed?

@iseeyuan iseeyuan moved this to In Progress in etLLM: LLMs via ExecuTorch Feb 10, 2025
@iseeyuan iseeyuan changed the title Explore the way to load a new LLM to llama_transformer [etLLM] Explore the way to load a new LLM to llama_transformer Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: In Progress
Development

No branches or pull requests

2 participants