Add Phi-4-mini-instruct #8856

jackzhxng · 2025-03-01T00:07:46Z

Summary

Add phi-4-mini 3.8B with fractional rotary embeddings. Only works for short context, still need to implement longrope for longer sequence lengths.

Sample prompt and response (xnnpack + 8da4w quant):

> A California roll is a type of sushi roll that is unique to the state of California. It is made with the same basic ingredients used for regular sushi rolls, but often includes some unique ingredients native to the state of California. Here is the basic ingredients used for a California roll:

> Prefill time: 0.547189474105835
> Token generation (tok/s): 6.3541260502510895
> Peak memory: 2.3 GB
> .pte size: 2.3 GB

Closes #8813

Test plan

Convert weights:

python examples/models/phi-4-mini/convert_weights.py ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/ phi_4_mini.pth

Export xnnpack with quantization:

python -m examples.models.llama.export_llama   --model phi-4-mini \
--params examples/models/phi-4-mini/config.json  --checkpoint phi_4_mini.pth \
-kv --use_sdpa_with_kv_cache  -X -d fp32 \
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'  \
--output_name phi4_mini_x_8da_4w.pte
--verbose
-qmode 8da4w --group_size 128
--embedding-quantize 4,32
--quantize_kv_cache

Run via pybindings:

python -m examples.models.llama.runner.native --model ph-4-mini \
--pte phi4_mini_x_8da_4w.pte  \
--tokenizer ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/tokenizer.json \
--tokenizer_config ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/tokenizer_config.json \
--prompt "What ingredients are in a California roll?" \
--params examples/models/phi-4-mini/config.json --max_len 64 \
--temperature 0 -kv

pytorch-bot · 2025-03-01T00:07:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8856

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit a8231d8 with merge base 7aa6494 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq
trunk / test-llama-runner-mac (fp32, xnnpack+custom+quantize_kv) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jackzhxng · 2025-03-01T01:09:18Z

@guangy10 any way I can run some on demand benchmarks for this model?

iseeyuan · 2025-03-01T14:28:43Z

This is awesome to enable a new model in a day!

iseeyuan · 2025-03-01T14:29:23Z

.ci/scripts/gather_test_models.py

    """
    if event == "pull_request":
-        return model in ["mv3", "vit"]
+        return model in ["mv3", "vit", "phi4_mini"]  # TODO: remove


Any reason to remove it, probably it's mostly covered by llama tests?

Oh it's just too large to run on every pull request, we only run the small ones on pull

iseeyuan

LGTM. Thanks

iseeyuan · 2025-03-02T01:02:59Z

It's nice that the exported .pte file can be verified via python binding. That inspired me if we could automate the process, similar to MLX. Essentially we could have hugging face model card name as input, and with a prompt we get the output.
I create #8872. Please let me know if it makes sense.

jackzhxng requested a review from lucylq as a code owner March 1, 2025 00:07

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2025

jackzhxng changed the title ~~Add phi4 mini~~ Add Phi-4 mini instruct Mar 1, 2025

jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Mar 1, 2025

Add phi4 mini

617d811

jackzhxng force-pushed the jz/add-phi4 branch from cd88afe to 617d811 Compare March 1, 2025 00:09

jackzhxng marked this pull request as draft March 1, 2025 00:13

jackzhxng added 2 commits February 28, 2025 16:27

Partial rotary embeddings

a12912a

Lint

147805a

jackzhxng marked this pull request as ready for review March 1, 2025 01:08

jackzhxng requested a review from iseeyuan March 1, 2025 01:09

jackzhxng requested a review from mergennachin March 1, 2025 01:16

iseeyuan reviewed Mar 1, 2025

View reviewed changes

iseeyuan approved these changes Mar 1, 2025

View reviewed changes

Convert script uses correct ckpt

8e0fc8c

jackzhxng changed the title ~~Add Phi-4 mini instruct~~ Add Phi-4-mini-instruct Mar 3, 2025

jackzhxng added 3 commits March 3, 2025 17:37

Merge branch 'main' into jz/add-phi4

ef717db

Merge branch 'main' into jz/add-phi4

bb60a20

Fix test_model.sh

859d3a4

jackzhxng force-pushed the jz/add-phi4 branch from 799c61e to 859d3a4 Compare March 4, 2025 23:50

Remove phi4_mini test from pull target

a8231d8

jackzhxng merged commit df17dca into main Mar 5, 2025
87 of 89 checks passed

jackzhxng deleted the jz/add-phi4 branch March 5, 2025 03:47

zonglinpeng pushed a commit that referenced this pull request Mar 6, 2025

Add Phi-4-mini-instruct (#8856)

6bf4e5b

github-actions bot mentioned this pull request Mar 10, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#16

Open

github-actions bot mentioned this pull request Mar 17, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#18

Open

github-actions bot mentioned this pull request Mar 24, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#20

Open

github-actions bot mentioned this pull request Mar 31, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#22

Open

github-actions bot mentioned this pull request Apr 7, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Phi-4-mini-instruct #8856

Add Phi-4-mini-instruct #8856

Uh oh!

jackzhxng commented Mar 1, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 1, 2025 •

edited

Loading

Uh oh!

jackzhxng commented Mar 1, 2025

Uh oh!

iseeyuan commented Mar 1, 2025

Uh oh!

iseeyuan Mar 1, 2025

Uh oh!

jackzhxng Mar 5, 2025

Uh oh!

iseeyuan left a comment

Uh oh!

iseeyuan commented Mar 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Phi-4-mini-instruct #8856

Add Phi-4-mini-instruct #8856

Uh oh!

Conversation

jackzhxng commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8856

❌ 2 New Failures

Uh oh!

jackzhxng commented Mar 1, 2025

Uh oh!

iseeyuan commented Mar 1, 2025

Uh oh!

iseeyuan Mar 1, 2025

Choose a reason for hiding this comment

Uh oh!

jackzhxng Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

iseeyuan left a comment

Choose a reason for hiding this comment

Uh oh!

iseeyuan commented Mar 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jackzhxng commented Mar 1, 2025 •

edited

Loading

pytorch-bot bot commented Mar 1, 2025 •

edited

Loading