[etLLM] Explore the way to load a new LLM to llama_transformer #8231

iseeyuan · 2025-02-05T21:10:51Z

It's related to the first part (eager mode definition of this RFC.

There's llama_transformer that used to be Llama specific. A lot of infra are built around of this model, like source transforms, quantization, export, lowering to each backend, etc.

In open source a lot of models share the same architecture. Can we reuse llama_transformer and all the existing infra, to quickly enable a new model with optimized performance in multiple Edge backends (CPU, NPUs, CoreML, etc.)?

Items to explore:

Utils to quickly convert different checkpoints to Llama compliant. An example is to leverage the torchtune utils and convert a Hugging Face checkpoint.
Build the model quickly with proper configs.
Any interfaces are missing to streamline this flow?
Hide the technical details (users may not understand operators or a delegate partitioner), and expose necessary APIs for user to config, to meet their KPIs. For example, based on the accuracy and memory, provide supported quantization bit width for users to config.
Quick return on results (perplexity, performance numbers, etc.)

cc @mergennachin @cccclai @helunwencser @dvorjackz

iseeyuan · 2025-02-06T18:31:19Z

Chatted with @sxu offline. Some additional points to consider:

scalar attributes: can be included in the map?
no submodule class check. Is it the assumption of the subclass type, especially when the model definition is released and fixed?

iseeyuan assigned jackzhxng Feb 5, 2025

iseeyuan added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: llm Issues related to LLM examples and apps, and to the extensions/llm/ code labels Feb 5, 2025

iseeyuan added this to etLLM: LLMs via ExecuTorch Feb 10, 2025

iseeyuan moved this to In Progress in etLLM: LLMs via ExecuTorch Feb 10, 2025

iseeyuan changed the title ~~Explore the way to load a new LLM to llama_transformer~~ [etLLM] Explore the way to load a new LLM to llama_transformer Feb 11, 2025

This was referenced Feb 11, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#5

Open

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#7

Open

This was referenced Feb 24, 2025

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#9

Open

Weekly issue metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

iseeyuan commented Feb 5, 2025 •

edited

Loading

iseeyuan commented Feb 6, 2025

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

[etLLM] Explore the way to load a new LLM to llama_transformer #8231

Comments

iseeyuan commented Feb 5, 2025 • edited Loading

iseeyuan commented Feb 6, 2025

iseeyuan commented Feb 5, 2025 •

edited

Loading