Adding int4 quantized tensor subclass #15

HDCharles · 2023-11-28T05:31:18Z

Stack from ghstack (oldest at bottom):

-> Adding int4 quantized tensor subclass #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor
subclass code to be easier to use with multiple subclasses. This
subclass uses the tinygemm int4 mixed dtype gemm that was added to
pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also
added support for .to for tensor subclasses to get the save/loading of
meta tensors working for int4.

Test Plan: python test/test.py

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 591d99c Pull Request resolved: #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: #15

Summary: Adding int4 quantized tensor subclass support, also refactoring tensor subclass code to be easier to use with multiple subclasses. This subclass uses the tinygemm int4 mixed dtype gemm that was added to pytroch as _weight_int4pack_mm and _convert_weight_to_int4pack. Also added support for .to for tensor subclasses to get the save/loading of meta tensors working for int4. Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e1fdcb9 Pull Request resolved: pytorch#15

* initial flow for autoround Signed-off-by: yiliu30 <[email protected]> * update flow Signed-off-by: yiliu30 <[email protected]> * use int4 kernel Signed-off-by: yiliu30 <[email protected]> * remove debug code Signed-off-by: yiliu30 <[email protected]> * update the forward Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * e2e example Signed-off-by: yiliu30 <[email protected]> * refine code Signed-off-by: yiliu30 <[email protected]> * add requirements for test Signed-off-by: yiliu30 <[email protected]> * update test Signed-off-by: yiliu30 <[email protected]> * update the readme Signed-off-by: yiliu30 <[email protected]> * add readme Signed-off-by: yiliu30 <[email protected]> * update the filenames Signed-off-by: yiliu30 <[email protected]> * update the np version Signed-off-by: yiliu30 <[email protected]> * add demo Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add more docs Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add doc Signed-off-by: yiliu30 <[email protected]> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <[email protected]> * impl ar using multensors Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * use hook + multensors Signed-off-by: yiliu30 <[email protected]> * separate mul_tensors into a new file Signed-off-by: yiliu30 <[email protected]> * fix typos Signed-off-by: yiliu30 <[email protected]> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <[email protected]> * enable amp Signed-off-by: yiliu30 <[email protected]> * eval model Signed-off-by: yiliu30 <[email protected]> * add gen examples Signed-off-by: yiliu30 <[email protected]> * add warmup to benchmark Signed-off-by: yiliu30 <[email protected]> * add benchmark Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * format code Signed-off-by: yiliu30 <[email protected]> * use tiny kernel Signed-off-by: yiliu30 <[email protected]> * add more note Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * correct typos Signed-off-by: yiliu30 <[email protected]> * remove hard code Signed-off-by: yiliu30 <[email protected]> * use intx Signed-off-by: yiliu30 <[email protected]> * enable offload for multitensor Signed-off-by: yiliu30 <[email protected]> * update the default config Signed-off-by: yiliu30 <[email protected]> * refine note Signed-off-by: yiliu30 <[email protected]> * update the version check Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> * add ut Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add scripts Signed-off-by: yiliu30 <[email protected]> * format code Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> * fix typo Signed-off-by: yiliu30 <[email protected]> * refine bench code Signed-off-by: yiliu30 <[email protected]> * Enable `use_optimized_layer_output` and AO' llama (#12) Signed-off-by: yiliu30 <[email protected]> * Refine the Doc (#14) --------- Signed-off-by: yiliu30 <[email protected]> * add more docstring Signed-off-by: yiliu30 <[email protected]> * add paper link Signed-off-by: yiliu30 <[email protected]> * correct some note Signed-off-by: yiliu30 <[email protected]> * add cmd Signed-off-by: yiliu30 <[email protected]> * udpdate the scripts Signed-off-by: yiliu30 <[email protected]> * revert some change Signed-off-by: yiliu30 <[email protected]> * Add a lightweight configuration for quick benchmarking (#15) Signed-off-by: yiliu30 <[email protected]> * update quant method name Signed-off-by: yiliu30 <[email protected]> * Wrap model's buffers and params to `MultiTensor` & update the results (#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <[email protected]> --------- Signed-off-by: yiliu30 <[email protected]>

* initial flow for autoround Signed-off-by: yiliu30 <[email protected]> * update flow Signed-off-by: yiliu30 <[email protected]> * use int4 kernel Signed-off-by: yiliu30 <[email protected]> * remove debug code Signed-off-by: yiliu30 <[email protected]> * update the forward Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * e2e example Signed-off-by: yiliu30 <[email protected]> * refine code Signed-off-by: yiliu30 <[email protected]> * add requirements for test Signed-off-by: yiliu30 <[email protected]> * update test Signed-off-by: yiliu30 <[email protected]> * update the readme Signed-off-by: yiliu30 <[email protected]> * add readme Signed-off-by: yiliu30 <[email protected]> * update the filenames Signed-off-by: yiliu30 <[email protected]> * update the np version Signed-off-by: yiliu30 <[email protected]> * add demo Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add more docs Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add doc Signed-off-by: yiliu30 <[email protected]> * use `AffineQuantizedTensor` Signed-off-by: yiliu30 <[email protected]> * impl ar using multensors Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * use hook + multensors Signed-off-by: yiliu30 <[email protected]> * separate mul_tensors into a new file Signed-off-by: yiliu30 <[email protected]> * fix typos Signed-off-by: yiliu30 <[email protected]> * rename mul_tensor to multi_tensor Signed-off-by: yiliu30 <[email protected]> * enable amp Signed-off-by: yiliu30 <[email protected]> * eval model Signed-off-by: yiliu30 <[email protected]> * add gen examples Signed-off-by: yiliu30 <[email protected]> * add warmup to benchmark Signed-off-by: yiliu30 <[email protected]> * add benchmark Signed-off-by: yiliu30 <[email protected]> * clean code Signed-off-by: yiliu30 <[email protected]> * format code Signed-off-by: yiliu30 <[email protected]> * use tiny kernel Signed-off-by: yiliu30 <[email protected]> * add more note Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * correct typos Signed-off-by: yiliu30 <[email protected]> * remove hard code Signed-off-by: yiliu30 <[email protected]> * use intx Signed-off-by: yiliu30 <[email protected]> * enable offload for multitensor Signed-off-by: yiliu30 <[email protected]> * update the default config Signed-off-by: yiliu30 <[email protected]> * refine note Signed-off-by: yiliu30 <[email protected]> * update the version check Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> * add ut Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * add scripts Signed-off-by: yiliu30 <[email protected]> * format code Signed-off-by: yiliu30 <[email protected]> * format Signed-off-by: yiliu30 <[email protected]> * update Signed-off-by: yiliu30 <[email protected]> * fix typo Signed-off-by: yiliu30 <[email protected]> * refine bench code Signed-off-by: yiliu30 <[email protected]> * Enable `use_optimized_layer_output` and AO' llama (pytorch#12) Signed-off-by: yiliu30 <[email protected]> * Refine the Doc (pytorch#14) --------- Signed-off-by: yiliu30 <[email protected]> * add more docstring Signed-off-by: yiliu30 <[email protected]> * add paper link Signed-off-by: yiliu30 <[email protected]> * correct some note Signed-off-by: yiliu30 <[email protected]> * add cmd Signed-off-by: yiliu30 <[email protected]> * udpdate the scripts Signed-off-by: yiliu30 <[email protected]> * revert some change Signed-off-by: yiliu30 <[email protected]> * Add a lightweight configuration for quick benchmarking (pytorch#15) Signed-off-by: yiliu30 <[email protected]> * update quant method name Signed-off-by: yiliu30 <[email protected]> * Wrap model's buffers and params to `MultiTensor` & update the results (pytorch#16) * wrap model's buffers and params to `MultiTensor` and update the results Signed-off-by: yiliu30 <[email protected]> --------- Signed-off-by: yiliu30 <[email protected]>

This was referenced Nov 28, 2023

Adding subclass and api for weight-only quant #11

Merged

Adding tests for save/load support #12

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 28, 2023

HDCharles mentioned this pull request Nov 28, 2023

Adding tests for save/load support #16

Merged

HDCharles changed the title ~~Adding int4 tensor subclass~~ Adding int4 quantized tensor subclass Nov 28, 2023

HDCharles merged commit 701b120 into gh/HDCharles/4/base Nov 28, 2023

HDCharles deleted the gh/HDCharles/4/head branch November 28, 2023 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding int4 quantized tensor subclass #15

Adding int4 quantized tensor subclass #15

HDCharles commented Nov 28, 2023 •

edited

Loading

Adding int4 quantized tensor subclass #15

Adding int4 quantized tensor subclass #15

Conversation

HDCharles commented Nov 28, 2023 • edited Loading

HDCharles commented Nov 28, 2023 •

edited

Loading