Skip to content

Add Phi-4-mini-instruct #8856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 5, 2025
Merged

Add Phi-4-mini-instruct #8856

merged 8 commits into from
Mar 5, 2025

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Mar 1, 2025

Summary

Add phi-4-mini 3.8B with fractional rotary embeddings. Only works for short context, still need to implement longrope for longer sequence lengths.

Sample prompt and response (xnnpack + 8da4w quant):

> A California roll is a type of sushi roll that is unique to the state of California. It is made with the same basic ingredients used for regular sushi rolls, but often includes some unique ingredients native to the state of California. Here is the basic ingredients used for a California roll:

> Prefill time: 0.547189474105835
> Token generation (tok/s): 6.3541260502510895
> Peak memory: 2.3 GB
> .pte size: 2.3 GB

Closes #8813

Test plan

Convert weights:

python examples/models/phi-4-mini/convert_weights.py ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/ phi_4_mini.pth

Export xnnpack with quantization:

python -m examples.models.llama.export_llama   --model phi-4-mini \
--params examples/models/phi-4-mini/config.json  --checkpoint phi_4_mini.pth \
-kv --use_sdpa_with_kv_cache  -X -d fp32 \
--metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}'  \
--output_name phi4_mini_x_8da_4w.pte
--verbose
-qmode 8da4w --group_size 128
--embedding-quantize 4,32
--quantize_kv_cache

Run via pybindings:

python -m examples.models.llama.runner.native --model ph-4-mini \
--pte phi4_mini_x_8da_4w.pte  \
--tokenizer ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/tokenizer.json \
--tokenizer_config ~/.cache/huggingface/hub/models--microsoft--Phi-4-multimodal-instruct/snapshots/879783f7b23e43c12d1c682e3458f115f3a7718d/tokenizer_config.json \
--prompt "What ingredients are in a California roll?" \
--params examples/models/phi-4-mini/config.json --max_len 64 \
--temperature 0 -kv

@jackzhxng jackzhxng requested a review from lucylq as a code owner March 1, 2025 00:07
Copy link

pytorch-bot bot commented Mar 1, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8856

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit a8231d8 with merge base 7aa6494 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2025
@jackzhxng jackzhxng changed the title Add phi4 mini Add Phi-4 mini instruct Mar 1, 2025
@jackzhxng jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Mar 1, 2025
@jackzhxng jackzhxng marked this pull request as draft March 1, 2025 00:13
@jackzhxng jackzhxng marked this pull request as ready for review March 1, 2025 01:08
@jackzhxng jackzhxng requested a review from iseeyuan March 1, 2025 01:09
@jackzhxng
Copy link
Contributor Author

@guangy10 any way I can run some on demand benchmarks for this model?

@jackzhxng jackzhxng requested a review from mergennachin March 1, 2025 01:16
@iseeyuan
Copy link
Contributor

iseeyuan commented Mar 1, 2025

This is awesome to enable a new model in a day!

@@ -90,7 +90,7 @@ def model_should_run_on_event(model: str, event: str) -> bool:
We put higher priority and fast models to pull request and rest to push.
"""
if event == "pull_request":
return model in ["mv3", "vit"]
return model in ["mv3", "vit", "phi4_mini"] # TODO: remove
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to remove it, probably it's mostly covered by llama tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's just too large to run on every pull request, we only run the small ones on pull

Copy link
Contributor

@iseeyuan iseeyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@iseeyuan
Copy link
Contributor

iseeyuan commented Mar 2, 2025

It's nice that the exported .pte file can be verified via python binding. That inspired me if we could automate the process, similar to MLX. Essentially we could have hugging face model card name as input, and with a prompt we get the output.
I create #8872. Please let me know if it makes sense.

@jackzhxng jackzhxng changed the title Add Phi-4 mini instruct Add Phi-4-mini-instruct Mar 3, 2025
@jackzhxng jackzhxng merged commit df17dca into main Mar 5, 2025
87 of 89 checks passed
@jackzhxng jackzhxng deleted the jz/add-phi4 branch March 5, 2025 03:47
zonglinpeng pushed a commit that referenced this pull request Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Phi4 mini instruct
3 participants