You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
70
+
| input | - | Input binding names. Expected to list shapes of each input bindings |
71
+
| model | - | Configure the model filename and name |
72
+
| filename | - | Model file name to load from disk. |
73
+
| name | - | Model name |
74
+
| runtime | - | Runtime configurations |
75
+
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
76
+
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
77
+
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
68
78
69
79
Additional sample use case:
70
80
@@ -88,3 +98,41 @@ runtime:
88
98
- fp32
89
99
- fp16
90
100
```
101
+
102
+
Note:
103
+
104
+
1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
105
+
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
106
+
107
+
### Using CompileSpec options via CLI
108
+
109
+
Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
110
+
111
+
*`--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt, tensorrt or fx2trt
112
+
*`--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `fx2trt`, the input should be a Pytorch module (instead of a torchscript module) and the options for model are (`vgg16` | `resnet50` | `efficientnet_b0`)
113
+
*`--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
114
+
*`--batch_size` : Batch size
115
+
*`--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
116
+
*`--device` : Device ID
117
+
*`--truncate` : Truncate long and double weights in the network in Torch-TensorRT
118
+
*`--is_trt_engine` : Boolean flag to be enabled if the model file provided is a TensorRT engine.
119
+
*`--report` : Path of the output file where performance summary is written.
This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in `hub.py` that we internally test for performance.
134
+
The torchscript modules for these models can be generated by running
135
+
```
136
+
python hub.py
137
+
```
138
+
You can refer to `benchmark.sh` on how we run/benchmark these models.
0 commit comments