testing autoquant #114

HDCharles · 2024-03-01T06:44:34Z

Stack from ghstack (oldest at bottom):

-> testing autoquant #114

Summary:

improves runtime by 19.70 -> 19.76 img/sec

Test Plan: sh run.sh

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: improves runtime by 19.70 -> 19.76 img/sec ❯ one sh run.sh 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:32<00:00, 6.14s/it] sam_model_type,batch_size,memory(MiB),memory(%),img_s(avg),batch_ms(avg)/batch_size,mIoU,use_compile,use_half,compress,epilogue_fusion_first,use_compile_decoder,use_nested_tensor,use_rel_pos,pad_input_image_batch,num_workers,num_batches,num_images,profile_path,memory_path vit_h,16,14532,17,18.861125832244333,53.01910442113876,0.5865236891447146,max-autotune,torch.bfloat16,None,False,False,True,True,True,32,64,1024,None,None 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [07:08<00:00, 6.70s/it] vit_h,16,14395,17,19.70834741975898,50.73992145061493,0.5875230894143607,max-autotune,torch.bfloat16,dynamic_quant,False,False,True,True,True,32,64,1024,None,None <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.850527899339795 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 3.190660197287798 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.768232116475701 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.8598313461989164 shape=(torch.Size([78400, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.4865157660096884 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 1.179535873234272 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.7427184619009497 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.4965661568567157 shape=(torch.Size([78400, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.215262923389673 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 3.485689079388976 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 5.220260447822511 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.2220821138471365 shape=(torch.Size([65536, 1280]), torch.Size([5120, 1280]), torch.Size([5120])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.666170105338097 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 2.626298717223108 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.855024302378297 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.674202110618353 shape=(torch.Size([65536, 5120]), torch.Size([1280, 5120]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.2269158866256475 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 2.6572815608233213 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 3.9978391956537966 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.2370124012231827 shape=(torch.Size([65536, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.2530277017503977 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 0.9894231799989939 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.5166664496064186 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.2606457574293017 shape=(torch.Size([65536, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [02:15<00:00, 2.12s/it] vit_h,16,14463,17,19.76190752324237,50.602402567863464,0.5875653903095147,max-autotune,torch.bfloat16,auto_quant,False,False,True,True,True,32,64,1024,None,None Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: improves runtime by 19.70 -> 19.76 img/sec ❯ one sh run.sh 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:32<00:00, 6.14s/it] sam_model_type,batch_size,memory(MiB),memory(%),img_s(avg),batch_ms(avg)/batch_size,mIoU,use_compile,use_half,compress,epilogue_fusion_first,use_compile_decoder,use_nested_tensor,use_rel_pos,pad_input_image_batch,num_workers,num_batches,num_images,profile_path,memory_path vit_h,16,14532,17,18.861125832244333,53.01910442113876,0.5865236891447146,max-autotune,torch.bfloat16,None,False,False,True,True,True,32,64,1024,None,None 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [07:08<00:00, 6.70s/it] vit_h,16,14395,17,19.70834741975898,50.73992145061493,0.5875230894143607,max-autotune,torch.bfloat16,dynamic_quant,False,False,True,True,True,32,64,1024,None,None <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.850527899339795 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 3.190660197287798 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.768232116475701 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.8598313461989164 shape=(torch.Size([78400, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.4865157660096884 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 1.179535873234272 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.7427184619009497 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.4965661568567157 shape=(torch.Size([78400, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.215262923389673 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 3.485689079388976 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 5.220260447822511 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.2220821138471365 shape=(torch.Size([65536, 1280]), torch.Size([5120, 1280]), torch.Size([5120])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.666170105338097 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 2.626298717223108 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.855024302378297 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.674202110618353 shape=(torch.Size([65536, 5120]), torch.Size([1280, 5120]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.2269158866256475 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 2.6572815608233213 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 3.9978391956537966 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.2370124012231827 shape=(torch.Size([65536, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.2530277017503977 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 0.9894231799989939 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.5166664496064186 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.2606457574293017 shape=(torch.Size([65536, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [02:15<00:00, 2.12s/it] vit_h,16,14463,17,19.76190752324237,50.602402567863464,0.5875653903095147,max-autotune,torch.bfloat16,auto_quant,False,False,True,True,True,32,64,1024,None,None Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ac0ddc1 Pull Request resolved: #114

Summary: improves runtime by 19.70 -> 19.76 img/sec ❯ one sh run.sh 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:32<00:00, 6.14s/it] sam_model_type,batch_size,memory(MiB),memory(%),img_s(avg),batch_ms(avg)/batch_size,mIoU,use_compile,use_half,compress,epilogue_fusion_first,use_compile_decoder,use_nested_tensor,use_rel_pos,pad_input_image_batch,num_workers,num_batches,num_images,profile_path,memory_path vit_h,16,14532,17,18.861125832244333,53.01910442113876,0.5865236891447146,max-autotune,torch.bfloat16,None,False,False,True,True,True,32,64,1024,None,None 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [07:08<00:00, 6.70s/it] vit_h,16,14395,17,19.70834741975898,50.73992145061493,0.5875230894143607,max-autotune,torch.bfloat16,dynamic_quant,False,False,True,True,True,32,64,1024,None,None <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.850527899339795 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 3.190660197287798 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.768232116475701 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.8598313461989164 shape=(torch.Size([78400, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.4865157660096884 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 1.179535873234272 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.7427184619009497 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.4965661568567157 shape=(torch.Size([78400, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.215262923389673 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 3.485689079388976 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 5.220260447822511 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.2220821138471365 shape=(torch.Size([65536, 1280]), torch.Size([5120, 1280]), torch.Size([5120])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.666170105338097 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 2.626298717223108 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.855024302378297 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.674202110618353 shape=(torch.Size([65536, 5120]), torch.Size([1280, 5120]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.2269158866256475 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 2.6572815608233213 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 3.9978391956537966 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.2370124012231827 shape=(torch.Size([65536, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.2530277017503977 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 0.9894231799989939 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.5166664496064186 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.2606457574293017 shape=(torch.Size([65536, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [02:15<00:00, 2.12s/it] vit_h,16,14463,17,19.76190752324237,50.602402567863464,0.5875653903095147,max-autotune,torch.bfloat16,auto_quant,False,False,True,True,True,32,64,1024,None,None Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: improves runtime by 19.70 -> 19.76 img/sec ❯ one sh run.sh 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:32<00:00, 6.14s/it] sam_model_type,batch_size,memory(MiB),memory(%),img_s(avg),batch_ms(avg)/batch_size,mIoU,use_compile,use_half,compress,epilogue_fusion_first,use_compile_decoder,use_nested_tensor,use_rel_pos,pad_input_image_batch,num_workers,num_batches,num_images,profile_path,memory_path vit_h,16,14532,17,18.861125832244333,53.01910442113876,0.5865236891447146,max-autotune,torch.bfloat16,None,False,False,True,True,True,32,64,1024,None,None 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [07:08<00:00, 6.70s/it] vit_h,16,14395,17,19.70834741975898,50.73992145061493,0.5875230894143607,max-autotune,torch.bfloat16,dynamic_quant,False,False,True,True,True,32,64,1024,None,None <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.850527899339795 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 3.190660197287798 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.768232116475701 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.8598313461989164 shape=(torch.Size([78400, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.4865157660096884 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 1.179535873234272 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.7427184619009497 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.4965661568567157 shape=(torch.Size([78400, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.215262923389673 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 3.485689079388976 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 5.220260447822511 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.2220821138471365 shape=(torch.Size([65536, 1280]), torch.Size([5120, 1280]), torch.Size([5120])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.666170105338097 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 2.626298717223108 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.855024302378297 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.674202110618353 shape=(torch.Size([65536, 5120]), torch.Size([1280, 5120]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.2269158866256475 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 2.6572815608233213 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 3.9978391956537966 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.2370124012231827 shape=(torch.Size([65536, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.2530277017503977 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 0.9894231799989939 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.5166664496064186 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.2606457574293017 shape=(torch.Size([65536, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [02:15<00:00, 2.12s/it] vit_h,16,14463,17,19.76190752324237,50.602402567863464,0.5875653903095147,max-autotune,torch.bfloat16,auto_quant,False,False,True,True,True,32,64,1024,None,None Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a9134d6 Pull Request resolved: #114

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL (pytorch-labs/segment-anything-fast#114, huggingface/diffusion-fast@176e85f) Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL (pytorch-labs/segment-anything-fast#114, huggingface/diffusion-fast@176e85f) Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3986099 Pull Request resolved: #38

Summary: improves runtime by 19.70 -> 19.76 img/sec ❯ one sh run.sh 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:32<00:00, 6.14s/it] sam_model_type,batch_size,memory(MiB),memory(%),img_s(avg),batch_ms(avg)/batch_size,mIoU,use_compile,use_half,compress,epilogue_fusion_first,use_compile_decoder,use_nested_tensor,use_rel_pos,pad_input_image_batch,num_workers,num_batches,num_images,profile_path,memory_path vit_h,16,14532,17,18.861125832244333,53.01910442113876,0.5865236891447146,max-autotune,torch.bfloat16,None,False,False,True,True,True,32,64,1024,None,None 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [07:08<00:00, 6.70s/it] vit_h,16,14395,17,19.70834741975898,50.73992145061493,0.5875230894143607,max-autotune,torch.bfloat16,dynamic_quant,False,False,True,True,True,32,64,1024,None,None <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.850527899339795 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.3931088875979185 3.190660197287798 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.768232116475701 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.8598313461989164 shape=(torch.Size([78400, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.4865157660096884 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.8800818361341953 1.179535873234272 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.7427184619009497 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.4965661568567157 shape=(torch.Size([78400, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.215262923389673 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.661373794078827 3.485689079388976 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 5.220260447822511 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.2220821138471365 shape=(torch.Size([65536, 1280]), torch.Size([5120, 1280]), torch.Size([5120])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 4.666170105338097 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 4.113288130611181 2.626298717223108 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 4.855024302378297 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 4.674202110618353 shape=(torch.Size([65536, 5120]), torch.Size([1280, 5120]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 3.2269158866256475 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 3.7462301552295685 2.6572815608233213 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 3.9978391956537966 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 3.2370124012231827 shape=(torch.Size([65536, 1280]), torch.Size([3840, 1280]), torch.Size([3840])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> <class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 1.2530277017503977 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 <class 'torchao.quantization.autoquant.AQInt8DynamicallyQuantizedLinearWeight'> 1.5717314090579748 0.9894231799989939 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight'> 1.5166664496064186 <class 'torchao.quantization.autoquant.AQWeightOnlyQuantizedLinearWeight3'> 1.2606457574293017 shape=(torch.Size([65536, 1280]), torch.Size([1280, 1280]), torch.Size([1280])), dtype=torch.bfloat16, best_cls=<class 'torchao.quantization.autoquant.AQFloatLinearWeight'> 0%| | 0/64 [00:00<?, ?it/s]/home/cdhernandez/local/pytorch/torch/nested/__init__.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /home/cdhernandez/local/pytorch/aten/src/ATen/NestedTensorImpl.cpp:177.) return _nested.nested_tensor( 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [02:15<00:00, 2.12s/it] vit_h,16,14463,17,19.76190752324237,50.602402567863464,0.5875653903095147,max-autotune,torch.bfloat16,auto_quant,False,False,True,True,True,32,64,1024,None,None Test Plan: Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9c6098e Pull Request resolved: #114

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9c6098e Pull Request resolved: #114

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 0dbb2ff Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fddbaf2 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f268031 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 94089f7 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 94089f7 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 94089f7 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3768385 Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3c1199d Pull Request resolved: #38

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D55103983](https://our.internmc.facebook.com/intern/diff/D55103983) [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 3c1199d Pull Request resolved: #81

Summary: Adding autoquantization functionality, using hte do_quant api we can test kernel speeds and pick the best quantization type (or no quantization) for each layer. Test Plan: python test/test.py -k "autoquant" also tested on SAM and SDXL pytorch-labs/segment-anything-fast#114 HDCharles/sdxl-fast@8d9942a Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 1, 2024

HDCharles mentioned this pull request Mar 5, 2024

Autoquant pytorch/ao#38

Merged

HDCharles added a commit that referenced this pull request Mar 19, 2024

testing autoquant

8f28b42

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9c6098e Pull Request resolved: #114

Update on "testing autoquant"

bbd94ac

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

HDCharles added a commit that referenced this pull request Mar 19, 2024

testing autoquant

1ae9a9a

Summary: improves runtime by 19.70 -> 19.76 img/sec Test Plan: sh run.sh Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 9c6098e Pull Request resolved: #114

HDCharles mentioned this pull request Mar 25, 2024

Autoquant pytorch/ao#81

Merged

HDCharles mentioned this pull request Mar 25, 2024

Autoquant pytorch/ao#82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing autoquant #114

testing autoquant #114

HDCharles commented Mar 1, 2024 •

edited

Loading

testing autoquant #114

Are you sure you want to change the base?

testing autoquant #114

Conversation

HDCharles commented Mar 1, 2024 • edited Loading

HDCharles commented Mar 1, 2024 •

edited

Loading