Merge pull request #1254 from pytorch/perf_changes

peri044 · web-flow · commit 1efe4b1598a2 · 2022-09-08T10:55:25.000-07:00
feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments
diff --git a/tools/perf/README.md b/tools/perf/README.md
@@ -4,7 +4,9 @@ This is a comprehensive Python benchmark suite to run perf runs using different
 
 1. Torch
 2. Torch-TensorRT
-3. TensorRT
+3. FX-TRT
+4. TensorRT
+
 
 Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
 
@@ -25,21 +27,35 @@ Benchmark scripts depends on following Python packages in addition to requiremen
 │   └── vgg16.yml
 ├── models
 ├── perf_run.py
+├── hub.py
+├── custom_models.py
+├── requirements.txt
+├── benchmark.sh
 └── README.md
 ```
 
-Please save your configuration files at config directory. Similarly, place your model files at models path.
+
+
+* `config` - Directory which contains sample yaml configuration files for VGG network.
+* `models` - Model directory
+* `perf_run.py` - Performance benchmarking script which supports torch, torch_tensorrt, fx2trt, tensorrt backends
+* `hub.py` - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
+* `custom_models.py` - Script which includes custom models other than torchvision and timm (eg: HF BERT)
+* `utils.py` - utility functions script
+* `benchmark.sh` - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.
 
 ## Usage
 
+There are two ways you can run a performance benchmark.
+
+### Using YAML config files
+
 To run the benchmark for a given configuration file:
 
-```
+```python
 python perf_run.py --config=config/vgg16.yml
 ```
 
-## Configuration
-
 There are two sample configuration files added.
 
 * vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
@@ -48,23 +64,17 @@ There are two sample configuration files added.
 
 ### Supported fields
 
-| Name | Supported Values | Description |
-| --- | --- | --- |
-| backend | all, torch, torch_tensorrt, tensorrt | Supported backends for inference. |
-| input | - | Input binding names. Expected to list shapes of each input bindings |
-| model | - | Configure the model filename and name |
-| filename | - | Model file name to load from disk. |
-| name | - | Model name |
-| runtime | - | Runtime configurations |
-| device | 0 | Target device ID to run inference. Range depends on available GPUs |
-| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
-| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
-
-Note:
-1. Please note that torch runtime perf is not supported for int8 yet.
-2. Torchscript module filename should end with .jit.pt otherwise it will be treated as a TensorRT engine.
-
-
+| Name              | Supported Values                     | Description                                                  |
+| ----------------- | ------------------------------------ | ------------------------------------------------------------ |
+| backend           | all, torch, torch_tensorrt, tensorrt | Supported backends for inference.                            |
+| input             | -                                    | Input binding names. Expected to list shapes of each input bindings |
+| model             | -                                    | Configure the model filename and name                        |
+| filename          | -                                    | Model file name to load from disk.                           |
+| name              | -                                    | Model name                                                   |
+| runtime           | -                                    | Runtime configurations                                       |
+| device            | 0                                    | Target device ID to run inference. Range depends on available GPUs |
+| precision         | fp32, fp16 or half, int8             | Target precision to run inference. int8 cannot be used with 'all' backend |
+| calibration_cache | -                                    | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
 
 Additional sample use case:
 
@@ -88,3 +98,41 @@ runtime:
     - fp32
     - fp16
 ```
+
+Note:
+
+1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
+2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
+
+### Using CompileSpec options via CLI
+
+Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
+
+* `--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt, tensorrt or fx2trt
+* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `fx2trt`, the input should be a Pytorch module (instead of a torchscript module) and the options for model are (`vgg16` | `resnet50` | `efficientnet_b0`)
+* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
+* `--batch_size` : Batch size
+* `--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
+* `--device` : Device ID
+* `--truncate` : Truncate long and double weights in the network in Torch-TensorRT
+* `--is_trt_engine` : Boolean flag to be enabled if the model file provided is a TensorRT engine.
+* `--report` : Path of the output file where performance summary is written.
+
+Eg:
+
+```
+  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
+                     --batch_size 1 \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vgg_perf_bs1.txt"
+```
+
+### Example models
+
+This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in `hub.py` that we internally test for performance.
+The torchscript modules for these models can be generated by running
+```
+python hub.py
+```
+You can refer to `benchmark.sh` on how we run/benchmark these models.
diff --git a/tools/perf/benchmark.sh b/tools/perf/benchmark.sh
@@ -0,0 +1,64 @@
+#!/bin/bash
+
+MODELS_DIR="models"
+
+# Download the Torchscript models
+python hub.py
+
+batch_sizes=(1 2 4 8 16 32 64 128 256)
+
+#Benchmark VGG16 model
+echo "Benchmarking VGG16 model"
+for bs in ${batch_sizes[@]}
+do
+  python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vgg_perf_bs${bs}.txt"
+done
+
+# Benchmark Resnet50 model
+echo "Benchmarking Resnet50 model"
+for bs in ${batch_sizes[@]}
+do
+  python perf_run.py --model ${MODELS_DIR}/resnet50_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "rn50_perf_bs${bs}.txt"
+done
+
+# Benchmark VIT model
+echo "Benchmarking VIT model"
+for bs in ${batch_sizes[@]}
+do
+  python perf_run.py --model ${MODELS_DIR}/vit_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "vit_perf_bs${bs}.txt"
+done
+
+# Benchmark EfficientNet-B0 model
+echo "Benchmarking EfficientNet-B0 model"
+for bs in ${batch_sizes[@]}
+do
+  python perf_run.py --model ${MODELS_DIR}/efficientnet_b0_scripted.jit.pt \
+                     --precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt,tensorrt \
+                     --report "eff_b0_perf_bs${bs}.txt"
+done
+
+# Benchmark BERT model
+echo "Benchmarking Huggingface BERT base model"
+for bs in ${batch_sizes[@]}
+do
+  python perf_run.py --model ${MODELS_DIR}/bert_base_uncased_traced.jit.pt \
+                     --precision fp32 --inputs="(${bs}, 128)@int32;(${bs}, 128)@int32" \
+                     --batch_size ${bs} \
+                     --backends torch,torch_tensorrt \
+                     --truncate \
+                     --report "bert_base_perf_bs${bs}.txt"
+done
diff --git a/tools/perf/config/vgg16.yml b/tools/perf/config/vgg16.yml
@@ -8,8 +8,9 @@ input:
     - 224
     - 224
   num_inputs: 1
+  batch_size: 1
 model:
-  filename: models/vgg16_traced.jit.pt
+  filename: models/vgg16_scripted.jit.pt
   name: vgg16
 runtime:
   device: 0
diff --git a/tools/perf/custom_models.py b/tools/perf/custom_models.py
@@ -0,0 +1,30 @@
+import torch
+import torch.nn as nn
+from transformers import BertModel, BertTokenizer, BertConfig
+import torch.nn.functional as F
+
+
+def BertModule():
+    model_name = "bert-base-uncased"
+    enc = BertTokenizer.from_pretrained(model_name)
+    text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
+    tokenized_text = enc.tokenize(text)
+    masked_index = 8
+    tokenized_text[masked_index] = "[MASK]"
+    indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
+    segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
+    tokens_tensor = torch.tensor([indexed_tokens])
+    segments_tensors = torch.tensor([segments_ids])
+    config = BertConfig(
+        vocab_size_or_config_json_file=32000,
+        hidden_size=768,
+        num_hidden_layers=12,
+        num_attention_heads=12,
+        intermediate_size=3072,
+        torchscript=True,
+    )
+    model = BertModel(config)
+    model.eval()
+    model = BertModel.from_pretrained(model_name, torchscript=True)
+    traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
+    return traced_model
diff --git a/tools/perf/hub.py b/tools/perf/hub.py
@@ -0,0 +1,132 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torchvision.models as models
+import timm
+from transformers import BertModel, BertTokenizer, BertConfig
+import os
+import json
+import custom_models as cm
+
+torch.hub._validate_not_a_forked_repo = lambda a, b, c: True
+
+torch_version = torch.__version__
+
+# Detect case of no GPU before deserialization of models on GPU
+if not torch.cuda.is_available():
+    raise Exception(
+        "No GPU found. Please check if installed torch version is compatible with CUDA version"
+    )
+
+# Downloads all model files again if manifest file is not present
+MANIFEST_FILE = "model_manifest.json"
+
+BENCHMARK_MODELS = {
+    "vgg16": {"model": models.vgg16(weights=None), "path": "script"},
+    "resnet50": {"model": models.resnet50(weights=None), "path": "script"},
+    "efficientnet_b0": {
+        "model": timm.create_model("efficientnet_b0", pretrained=True),
+        "path": "script",
+    },
+    "vit": {
+        "model": timm.create_model("vit_base_patch16_224", pretrained=True),
+        "path": "script",
+    },
+    "bert_base_uncased": {"model": cm.BertModule(), "path": "trace"},
+}
+
+
+def get(n, m, manifest):
+    print("Downloading {}".format(n))
+    traced_filename = "models/" + n + "_traced.jit.pt"
+    script_filename = "models/" + n + "_scripted.jit.pt"
+    x = torch.ones((1, 3, 300, 300)).cuda()
+    if n == "bert-base-uncased":
+        traced_model = m["model"]
+        torch.jit.save(traced_model, traced_filename)
+        manifest.update({n: [traced_filename]})
+    else:
+        m["model"] = m["model"].eval().cuda()
+        if m["path"] == "both" or m["path"] == "trace":
+            trace_model = torch.jit.trace(m["model"], [x])
+            torch.jit.save(trace_model, traced_filename)
+            manifest.update({n: [traced_filename]})
+        if m["path"] == "both" or m["path"] == "script":
+            script_model = torch.jit.script(m["model"])
+            torch.jit.save(script_model, script_filename)
+            if n in manifest.keys():
+                files = list(manifest[n]) if type(manifest[n]) != list else manifest[n]
+                files.append(script_filename)
+                manifest.update({n: files})
+            else:
+                manifest.update({n: [script_filename]})
+    return manifest
+
+
+def download_models(version_matches, manifest):
+    # Download all models if torch version is different than model version
+    if not version_matches:
+        for n, m in BENCHMARK_MODELS.items():
+            manifest = get(n, m, manifest)
+    else:
+        for n, m in BENCHMARK_MODELS.items():
+            scripted_filename = "models/" + n + "_scripted.jit.pt"
+            traced_filename = "models/" + n + "_traced.jit.pt"
+            # Check if model file exists on disk
+            if (
+                (
+                    m["path"] == "both"
+                    and os.path.exists(scripted_filename)
+                    and os.path.exists(traced_filename)
+                )
+                or (m["path"] == "script" and os.path.exists(scripted_filename))
+                or (m["path"] == "trace" and os.path.exists(traced_filename))
+            ):
+                print("Skipping {} ".format(n))
+                continue
+            manifest = get(n, m, manifest)
+
+
+def main():
+    manifest = None
+    version_matches = False
+    manifest_exists = False
+
+    # Check if Manifest file exists or is empty
+    if not os.path.exists(MANIFEST_FILE) or os.stat(MANIFEST_FILE).st_size == 0:
+        manifest = {"version": torch_version}
+
+        # Creating an empty manifest file for overwriting post setup
+        os.system("touch {}".format(MANIFEST_FILE))
+    else:
+        manifest_exists = True
+
+        # Load manifest if already exists
+        with open(MANIFEST_FILE, "r") as f:
+            manifest = json.load(f)
+            if manifest["version"] == torch_version:
+                version_matches = True
+            else:
+                print(
+                    "Torch version: {} mismatches \
+                with manifest's version: {}. Re-downloading \
+                all models".format(
+                        torch_version, manifest["version"]
+                    )
+                )
+
+                # Overwrite the manifest version as current torch version
+                manifest["version"] = torch_version
+
+    download_models(version_matches, manifest)
+
+    # Write updated manifest file to disk
+    with open(MANIFEST_FILE, "r+") as f:
+        data = f.read()
+        f.seek(0)
+        record = json.dumps(manifest)
+        f.write(record)
+        f.truncate()
+
+
+main()
diff --git a/tools/perf/perf_run.py b/tools/perf/perf_run.py
diff --git a/tools/perf/utils.py b/tools/perf/utils.py