This sample contains code and a notebook that convert TensorFlow Lite Detection model to ONNX model and performs TensorRT inference on Jetson.
- Export TensorFlow Lite Detection Model.
- Convert to ONNX Model.
- Add TensorRT TFLiteNMS Plugin to ONNX Model.
- Build TensorRT Plugins.
- Convert ONNX Model to Serialize engine and inference on Jetson.
tf2onnx converts TensorFlow Lite models to ONNX models. This allows inference in TensorRT using the TensorFlow Lite model. However, TensorRT 7 does not support NonMaxSuppression, so it is not possible to run the detection model. TensorRT's batchedNMSPlugin and nmsPlugin are not compatible with TensorFlow Lite's TFLite_Detection_PostProcess. Therefore, create a plugin TFLiteNMS_TRT to run the TensorFlow Lite detection model.
- Host PC
- Linux (Ubuntu 18.04) or Google Colab
- Jetson
- JetPack 4.5.1
The following is executed on the Host PC.
The Add_TFLiteNMS_Plugin notebook contains all the steps to convert from TensorFlow Lite to the ONNX with TFLite NMS Plugin model.
Install TensorFlow and Object Detection API with TensorFlow 1.
sudo apt update
sudo apt install protobuf-compiler
pip3 install tensorflow==1.15 tensorflow-addons
git clone https://github.com/tensorflow/models.git
export PYTHONPATH=`pwd`/models:$PYTHONPATH
cd models/research
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf1/setup.py .
python3 -m pip install .
Download SSDLite MobileNet V2 checkpoint.
wget http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz
tar xf ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz
Export TF-Lite FP32 Model.
Note: Specify Float for inference_input_type and TFLite_Detection_PostProcess for output_arrays.
python3 object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path="./ssdlite_mobilenet_v2_coco_2018_05_09/pipeline.config" \
--trained_checkpoint_prefix="./ssdlite_mobilenet_v2_coco_2018_05_09/model.ckpt" \
--output_directory="./ssdlite_mobilenet_v2_coco_2018_05_09/tflite" \
--add_postprocessing_op=true
tflite_convert \
--enable_v1_converter \
--graph_def_file="./ssdlite_mobilenet_v2_coco_2018_05_09/tflite/tflite_graph.pb" \
--output_file="./ssdlite_mobilenet_v2_coco_2018_05_09/tflite/ssdlite_mobilenet_v2_300x300.tflite" \
--inference_input_type=FLOAT \
--inference_type=FLOAT \
--input_arrays="normalized_input_image_tensor" \
--output_arrays="TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3" \
--input_shapes=1,300,300,3 \
--allow_nudging_weights_to_use_fast_gemm_kernel=true \
--allow_custom_op
Install onnxruntime and tf2onnx.
pip3 install onnxruntime tf2onnx
Convert TensorFlow Lite Model to ONNX Model.
Note: TensorRT 7.2 supports operators up to Opset 11.
python3 -m tf2onnx.convert --opset 11 \
--tflite ./ssdlite_mobilenet_v2_coco_2018_05_09/tflite/ssdlite_mobilenet_v2_300x300.tflite \
--output ./ssdlite_mobilenet_v2_coco_2018_05_09/onnx/ssdlite_mobilenet_v2_300x300.onnx
Install onnx_graphsurgeon.
python3 -m pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com
Clone this repository.
git clone https://github.com/NobuoTsukamoto/tensorrt-examples
cd ./tensorrt-examples/python/detection
Add TFLiteNMSPlugin.
python3 add_tensorrt_tflitenms_plugin.py \
--input ../../../ssdlite_mobilenet_v2_coco_2018_05_09/onnx/ssdlite_mobilenet_v2_300x300.onnx \
--output ../../../ssdlite_mobilenet_v2_coco_2018_05_09/onnx/ssdlite_mobilenet_v2_300x300_gs.onnx
If you check the converted ssdlite_mobilenet_v2_300x300_gs.onnx in Netron, you can see that NonMaxSuppression has been replaced by TFLiteNMS_TRT.
The full options for add_tensrrt_tflitenms_plugin.py are:
python add_tensorrt_tflitenms_plugin.py --help
usage: add_tensorrt_tflitenms_plugin.py [-h] --input INPUT --output OUTPUT [--max_classes_per_detection MAX_CLASSES_PER_DETECTION]
[--max_detections MAX_DETECTIONS] [--background_label_id BACKGROUND_LABEL_ID] [--nms_iou_threshold NMS_IOU_THRESHOLD]
[--nms_score_threshold NMS_SCORE_THRESHOLD] [--num_classes NUM_CLASSES] [--y_scale Y_SCALE] [--x_scale X_SCALE]
[--h_scale H_SCALE] [--w_scale W_SCALE] [--efficientdet]
optional arguments:
-h, --help show this help message and exit
--input INPUT Input ONNX model path.
--output OUTPUT Output ONXX (Add TFLIteNMS_TRT) model path.
--max_classes_per_detection MAX_CLASSES_PER_DETECTION
TFLite_Detection_PostProcess Attributes "detections_per_class".
--max_detections MAX_DETECTIONS
TFLite_Detection_PostProcess Attributes ""
--background_label_id BACKGROUND_LABEL_ID
Background Label ID(TF 1 Detection Model is 0).
--nms_iou_threshold NMS_IOU_THRESHOLD
TFLite_Detection_PostProcess Attributes "nms_iou_threshold"
--nms_score_threshold NMS_SCORE_THRESHOLD
TFLite_Detection_PostProcess Attributes "nms_score_threshold"
--num_classes NUM_CLASSES
TFLite_Detection_PostProcess Attributes "num_classes"
--y_scale Y_SCALE TFLite_Detection_PostProcess Attributes "y_scale"
--x_scale X_SCALE TFLite_Detection_PostProcess Attributes "x_scale"
--h_scale H_SCALE TFLite_Detection_PostProcess Attributes "h_scale"
--w_scale W_SCALE TFLite_Detection_PostProcess Attributes "w_scale"
--efficientdet Currently not supported.
The parameters of TFLiteNMS_TRT are similar to those of TFLite_Detection_PostProcess. However, keep in mind that making nms_score_threshold too small will significantly increase the inference time.
The following is executed on Jetson (JetPack 4.5.1).
Install the software needed to build TensorRT.
Note: Jetson's pre-installed CMake is 3.10.2, but TensorRT requires 3.13 or higher, so install it from snap.
sudo apt remove cmake
sudo snap install cmake --classic
sudo reboot
Clone repository and init submodule.
cd ~
git clone https://github.com/NobuoTsukamoto/tensorrt-examples
cd ./tensorrt-examples
git submodule update --init --recursive
Now build TensorRT.
export TRT_LIBPATH=`pwd`/TensorRT
export PATH=${PATH}:/usr/local/cuda/bin
cd $TRT_LIBPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=10.2 -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=/usr/bin/gcc
make -j$(nproc)
At this time, Jetpack 4.5.1 TensorRT 7.1.3 does not support FP16 for NMS. Therefore, copy only the plugin.
sudo cp out/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/
sudo rm /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
sudo ln -s /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7.2.3 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.7
Copy ssdlite_mobilenet_v2_300x300_gs.onnx to jetson and check model.
/usr/src/tensorrt/bin/trtexec --onnx=/home/jetson/tensorrt-examples/models/ssdlite_mobiledet_gpu_300x300_gs.onnx
Install pycuda.
See details:
sudo apt install python3-dev
pip3 install --global-option=build_ext --global-option="-I/usr/local/cuda/include" --global-option="-L/usr/local/cuda/lib64" pycuda
Convert to Serialize engine file.
If you want to convert to FP16 model, add --fp16 to the argument of convert_onnxgs2trt.py.
cd ~/tensorrt-examples/python/detection/
python3 convert_onnxgs2trt.py \
--model /home/jetson/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_gs.onnx \
--output /home/jetson/tensorrt-examples/models/ssdlite_mobilenet_v2_300x300_fp16.trt \
--fp16
Finally you can run the demo (width and height options are model input).
python3 trt_detection.py \
--model ../../models/ssdlite_mobilenet_v2_300x300_fp16.trt \
--label ../../models/coco_labels.txt \
--width 300 \
--height 300