You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add support for save quantized checkpoint in llama code
Summary:
The goal is to upload a torchao quantized model to huggingface so that we can run the model in huggingface
Test Plan:
python generate.py -q int4wo-32 --save
Reviewers:
Subscribers:
Tasks:
Tags:
parser.add_argument('--compile', action='store_true', help='Whether to compile the model.')
94
+
parser.add_argument('--save', action='store_true', help='Whether to save the model.')
84
95
parser.add_argument('--batch_size', type=int, default=1, help='Batch size to use for evaluation, note int8wo and int4wo work best with small batchsizes, int8dq works better with large batchsizes')
85
96
parser.add_argument('--max_length', type=int, default=None, help='Length of text to process at one time')
0 commit comments