Fix quantized embedding export logic #3095

larryliu0820 · 2024-04-17T15:56:07Z

Add patches to make 4bit quantized embedding work for export. Fixed:

Schema mismatch between functional embedding_4bit and out variant
Set packed=True for 4bit quantization

pytorch-bot · 2024-04-17T15:56:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3095

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a28e73b with merge base 06beace ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This diff adds support for multi query attention for sdpa with kv cache Reviewed By: iseeyuan Differential Revision: D56212419

Summary: 4b embedding quantizer Reviewed By: larryliu0820 Differential Revision: D56229021

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2024-04-19T05:28:07Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2024-04-19T05:34:15Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mikekgfb

Thank you!

facebook-github-bot · 2024-04-19T08:38:39Z

@larryliu0820 merged this pull request in 2c467dd.

Summary: In #3095 there's an issue with the embedding_4bit schema which causes mismatch between functional and out variant. P1217884556 Differential Revision: D56357762

Summary: Pull Request resolved: #3151 In #3095 there's an issue with the embedding_4bit schema which causes mismatch between functional and out variant. P1217884556 Reviewed By: mergennachin, digantdesai Differential Revision: D56357762 fbshipit-source-id: e8a1c249a02bfb4db295a1a933a8b3054e11099a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2024

larryliu0820 force-pushed the larryliu0820-patch branch 2 times, most recently from 7977cc2 to 1dc7a5c Compare April 18, 2024 16:39

kimishpatel and others added 3 commits April 18, 2024 20:42

{executorch][llama] support mqa

73d5e7e

Summary: This diff adds support for multi query attention for sdpa with kv cache Reviewed By: iseeyuan Differential Revision: D56212419

4b embedding quantizer (#3081)

f7c1459

Summary: 4b embedding quantizer Reviewed By: larryliu0820 Differential Revision: D56229021

Patch

f938acb

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

larryliu0820 force-pushed the larryliu0820-patch branch 2 times, most recently from d504a61 to 4b7050d Compare April 19, 2024 05:25

larryliu0820 changed the title ~~Larryliu0820 patch~~ Fix quantized embedding export logic Apr 19, 2024

Define embedding_4bit ops

a28e73b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

larryliu0820 force-pushed the larryliu0820-patch branch from 4b7050d to a28e73b Compare April 19, 2024 05:34

mikekgfb approved these changes Apr 19, 2024

View reviewed changes

facebook-github-bot closed this in 2c467dd Apr 19, 2024

facebook-github-bot added the Merged label Apr 19, 2024

facebook-github-bot pushed a commit that referenced this pull request Apr 19, 2024

Fix embedding_4bit out variant

ded3d25

Summary: In #3095 there's an issue with the embedding_4bit schema which causes mismatch between functional and out variant. P1217884556 Differential Revision: D56357762

larryliu0820 mentioned this pull request Apr 19, 2024

Fix embedding_4bit out variant #3151

Closed

larryliu0820 added a commit that referenced this pull request Apr 19, 2024

Fix embedding_4bit out variant (#3151)

88e5fc8

Summary: In #3095 there's an issue with the embedding_4bit schema which causes mismatch between functional and out variant. P1217884556 Differential Revision: D56357762

mergennachin mentioned this pull request Apr 26, 2024

disclaimer #3376

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quantized embedding export logic #3095

Fix quantized embedding export logic #3095

larryliu0820 commented Apr 17, 2024 •

edited

Loading

pytorch-bot bot commented Apr 17, 2024 •

edited

Loading

facebook-github-bot commented Apr 19, 2024

facebook-github-bot commented Apr 19, 2024

mikekgfb left a comment

facebook-github-bot commented Apr 19, 2024

Fix quantized embedding export logic #3095

Fix quantized embedding export logic #3095

Conversation

larryliu0820 commented Apr 17, 2024 • edited Loading

pytorch-bot bot commented Apr 17, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3095

✅ No Failures

facebook-github-bot commented Apr 19, 2024

facebook-github-bot commented Apr 19, 2024

mikekgfb left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 19, 2024

larryliu0820 commented Apr 17, 2024 •

edited

Loading

pytorch-bot bot commented Apr 17, 2024 •

edited

Loading