opencv
diff --git a/‎.DS_Store
-6 KB b/‎.DS_Store
-6 KB
diff --git a/‎models/.DS_Store
-6 KB b/‎models/.DS_Store
-6 KB
diff --git a/‎models/object_detection_nanodet/README.md
+38-59 b/‎models/object_detection_nanodet/README.md
+38-59
diff --git a/‎models/object_detection_nanodet/demo.py
+127-99 b/‎models/object_detection_nanodet/demo.py
+127-99
diff --git a/‎models/object_detection_nanodet/examples/results/TestResult1.png
-893 KB b/‎models/object_detection_nanodet/examples/results/TestResult1.png
-893 KB
diff --git a/‎models/object_detection_nanodet/examples/results/TestResult2.png
-1.03 MB b/‎models/object_detection_nanodet/examples/results/TestResult2.png
-1.03 MB
diff --git a/‎models/object_detection_nanodet/examples/results/test1_res.jpg
489 KB b/‎models/object_detection_nanodet/examples/results/test1_res.jpg
489 KB
diff --git a/‎models/object_detection_nanodet/examples/results/test2_res.jpg
126 KB b/‎models/object_detection_nanodet/examples/results/test2_res.jpg
126 KB
@@ -2,8 +2,35 @@
 
 Nanodet: NanoDet is a FCOS-style one-stage anchor-free object detection model which using Generalized Focal Loss as classification and regression loss.In NanoDet-Plus, we propose a novel label assignment strategy with a simple assign guidance module (AGM) and a dynamic soft label assigner (DSLA) to solve the optimal label assignment problem in lightweight model training.
 
-#### Model metrics:
-Average Precision and Recall values observed for COCO dataset classes are showed below 
+Note:
+- This version of nanodet: Nanodet-m-plus-1.5x_416
+
+## Demo
+
+Run the following command to try the demo: 
+```shell
+# detect on camera input
+python demo.py
+# detect on an image
+python demo.py --input /path/to/image
+```
+Note: 
+- image result saved as "result.jpg"
+
+
+## Results
+
+Here are some of the sample results that were observed using the model,
+
+![test1_res.jpg](./examples/results/test1_res.jpg)
+![test2_res.jpg](./examples/results/test2_res.jpg)
+  
+Video inference result,
+![WebCamR.gif](./examples/results/WebCamR.gif)
+
+## Model metrics:
+
+The model is evaluated on [COCO 2017 val](https://cocodataset.org/#download). Results are showed below:
 
 <table>
 <tr><th>Average Precision </th><th>Average Recall</th></tr>
@@ -30,63 +57,6 @@ Average Precision and Recall values observed for COCO dataset classes are showed
 |  large  |  0.50:0.95  |  0.702  |
 </td></tr> </table>
 
-
-## Demo
-
-Run the following command to try the demo: 
-```shell
-# Nanodet inference on image input
-python demo.py --model /path/to/model/ --input_type image --image_path /path/to/image/
-
-# Nanodet inference on video input
-python demo.py --model /path/to/model/ --input_type video 
-
-#Saving outputs 
-#Image output
-python demo.py --model /path/to/model/ --input_type image --image_path /path/to/image/ --save True
-
-#Video output
-python demo.py --model /path/to/model/ --input_type video --save True
-```
-Note: 
-- By default input_type: image
-- image result saved as "result.jpg"
-- webcam result saved as "Webcam_result.mp4"
-
-
-## Results
-
-Here are some of the sample results that were observed using the model,
-
-<p float="left">
-  <img src="./examples/results/TestResult1.png" width="450" height="450">
-  <img src="./examples/results/TestResult2.png" width="450" height="450">
-</p>
-  
-Video inference result,
-<p align="center">
-  <img src="https://github.com/Sidd1609/opencv_zoo/blob/master/models/object_detection_nanodet/examples/results/WebCamR.gif" width="650" height="450">
-</p>
-  
-
-## License
-
-All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
-
-
-## Reference
-
-- Nanodet: https://zhuanlan.zhihu.com/p/306530300
-- Nanodet Plus: https://zhuanlan.zhihu.com/p/449912627
-- Nanodet weight and scripts for training: https://github.com/RangiLyu/nanodet
-
-
-#### Note:
-
-- This version of nanodet: Nanodet-m-plus-1.5x_416
-- The model was trained on COCO 2017 dataset, link to dataset: https://cocodataset.org/#download
-- Below, we have results of COCO data inference
-
 | class         | AP50   | mAP   | class          | AP50   | mAP   |
 |:--------------|:-------|:------|:---------------|:-------|:------|
 | person        | 67.5   | 41.8  | bicycle        | 35.4   | 18.8  |
@@ -130,6 +100,9 @@ All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
 | scissors      | 27.8   | 17.8  | teddy bear     | 54.1   | 35.4  |
 | hair drier    | 2.9    | 1.1   | toothbrush     | 13.1   | 8.2   |
 
+## License
+
+All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
 
 #### Contributor Details
 
@@ -138,3 +111,9 @@ All files in this directory are licensed under [Apache 2.0 License](./LICENSE).
 - Github Profile: https://github.com/Sidd1609
 - Organisation: OpenCV
 - Project: Lightweight object detection models using OpenCV 
+
+## Reference
+
+- Nanodet: https://zhuanlan.zhihu.com/p/306530300
+- Nanodet Plus: https://zhuanlan.zhihu.com/p/449912627
+- Nanodet weight and scripts for training: https://github.com/RangiLyu/nanodet
@@ -1,8 +1,16 @@
-import cv2
 import numpy as np
+import cv2
 import argparse
-import time
-from NanodetPlus import NanoDet
+
+from nanodet import NanoDet
+
+def str2bool(v):
+    if v.lower() in ['on', 'yes', 'true', 'y', 't']:
+        return True
+    elif v.lower() in ['off', 'no', 'false', 'n', 'f']:
+        return False
+    else:
+        raise NotImplementedError
 
 backends = [cv2.dnn.DNN_BACKEND_OPENCV, cv2.dnn.DNN_BACKEND_CUDA]
 targets = [cv2.dnn.DNN_TARGET_CPU, cv2.dnn.DNN_TARGET_CUDA, cv2.dnn.DNN_TARGET_CUDA_FP16]
@@ -15,131 +23,151 @@
     help_msg_backends += "; {:d}: TIMVX"
     help_msg_targets += "; {:d}: NPU"
 except:
-    print('This version of OpenCV does not support TIM-VX and NPU. Visit https://gist.github.com/Sidd1609/5bb321c8733110ed613ec120c7c02e41 for more information.')
-
-classes = (    'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
-               'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
-               'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
-               'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
-               'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
-               'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
-               'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
-               'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
-               'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
-               'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
-               'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
-               'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
-               'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
-               'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
-            )
-
-def vis(preds, res_img):
-    if preds is not None:
-        image_shape = (416, 416)
-        top, left, newh, neww = 0, 0, image_shape[0], image_shape[1]
-        hw_scale = res_img.shape[0] / res_img.shape[1]
+    print('This version of OpenCV does not support TIM-VX and NPU. Visit https://github.com/opencv/opencv/wiki/TIM-VX-Backend-For-Running-OpenCV-On-NPU for more information.')
+
+classes = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
+           'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
+           'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
+           'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
+           'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
+           'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
+           'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
+           'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
+           'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
+           'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+           'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+           'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
+           'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
+           'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
+
+def letterbox(srcimg, target_size=(416, 416)):
+    img = srcimg.copy()
+
+    top, left, newh, neww = 0, 0, target_size[0], target_size[1]
+    if img.shape[0] != img.shape[1]:
+        hw_scale = img.shape[0] / img.shape[1]
         if hw_scale > 1:
-            newh, neww = image_shape[0], int(image_shape[1] / hw_scale)
-            left = int((image_shape[1] - neww) * 0.5)
+            newh, neww = target_size[0], int(target_size[1] / hw_scale)
+            img = cv2.resize(img, (neww, newh), interpolation=cv2.INTER_AREA)
+            left = int((target_size[1] - neww) * 0.5)
+            img = cv2.copyMakeBorder(img, 0, 0, left, target_size[1] - neww - left, cv2.BORDER_CONSTANT, value=0)  # add border
         else:
-            newh, neww = int(image_shape[0] * hw_scale), image_shape[1]
-            top = int((image_shape[0] - newh) * 0.5)
-
-        ratioh,ratiow = res_img.shape[0]/newh,res_img.shape[1]/neww
-
-        det_bboxes = preds[0]
-        det_conf = preds[1]
-        det_classid = preds[2]
-
-        for i in range(det_bboxes.shape[0]):
-            xmin, ymin, xmax, ymax = max(int((det_bboxes[i,0] - left) * ratiow), 0), max(int((det_bboxes[i,1] - top) * ratioh), 0), min(
-            int((det_bboxes[i,2] - left) * ratiow), res_img.shape[1]), min(int((det_bboxes[i,3] - top) * ratioh), res_img.shape[0])
-            cv2.rectangle(res_img, (xmin, ymin), (xmax, ymax), (0, 0, 0), thickness=2)
-            #label = '%.2f' % det_conf[i]
-            label=''
-            label = '%s%s' % (classes[det_classid[i]], label)
-            labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1)
-            top = max(top, labelSize[1])
-            # cv.rectangle(frame, (left, top - round(1.5 * labelSize[1])), (left + round(1.5 * labelSize[0]), top + baseLine), (255,255,255), cv.FILLED)
-            cv2.putText(res_img, label, (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), thickness=2)
-
+            newh, neww = int(target_size[0] * hw_scale), target_size[1]
+            img = cv2.resize(img, (neww, newh), interpolation=cv2.INTER_AREA)
+            top = int((target_size[0] - newh) * 0.5)
+            img = cv2.copyMakeBorder(img, top, target_size[0] - newh - top, 0, 0, cv2.BORDER_CONSTANT, value=0)
     else:
-        print('No detections')
+        img = cv2.resize(img, target_size, interpolation=cv2.INTER_AREA)
+
+    letterbox_scale = [top, left, newh, neww]
+    return img, letterbox_scale
+
+def unletterbox(bbox, original_image_shape, letterbox_scale):
+    ret = bbox.copy()
+
+    h, w = original_image_shape
+    top, left, newh, neww = letterbox_scale
+
+    if h == w:
+        ratio = h / newh
+        ret = ret * ratio
+        return ret
+
+    ratioh, ratiow = h / newh, w / neww
+    ret[0] = max((ret[0] - left) * ratiow, 0)
+    ret[1] = max((ret[1] - top) * ratioh, 0)
+    ret[2] = min((ret[2] - left) * ratiow, w)
+    ret[3] = min((ret[3] - top) * ratioh, h)
+
+    return ret.astype(np.int32)
+
+def vis(preds, res_img, letterbox_scale, fps=None):
+    ret = res_img.copy()
 
-    return res_img
+    # draw FPS
+    if fps is not None:
+        fps_label = "FPS: %.2f" % fps
+        cv2.putText(ret, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
+
+    # draw bboxes and labels
+    for pred in preds:
+        bbox = pred[:4]
+        conf = pred[-2]
+        classid = pred[-1].astype(np.int32)
+
+        # bbox
+        xmin, ymin, xmax, ymax = unletterbox(bbox, ret.shape[:2], letterbox_scale)
+        cv2.rectangle(ret, (xmin, ymin), (xmax, ymax), (0, 255, 0), thickness=2)
+
+        # label
+        label = "{:s}: {:.2f}".format(classes[classid], conf)
+        cv2.putText(ret, label, (xmin, ymin - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), thickness=2)
+
+    return ret
 
 if __name__=='__main__':
     parser = argparse.ArgumentParser(description='Nanodet inference using OpenCV an contribution by Sri Siddarth Chakaravarthy part of GSOC_2022')
-    parser.add_argument('--model', type=str, default='object_detection_nanodet-plus-m-1.5x-416.onnx', help="Path to the model")
-    parser.add_argument('--input_type', type=str, default='image', help="Input types: image or video")
-    parser.add_argument('--image_path', type=str, default='test2.jpg', help="Image path")
+    parser.add_argument('--input', '-i', type=str, help='Path to the input image. Omit for using default camera.')
+    parser.add_argument('--model', '-m', type=str, default='object_detection_nanodet_2022nov.onnx', help="Path to the model")
+    parser.add_argument('--backend', '-b', type=int, default=backends[0], help=help_msg_backends.format(*backends))
+    parser.add_argument('--target', '-t', type=int, default=targets[0], help=help_msg_targets.format(*targets))
     parser.add_argument('--confidence', default=0.35, type=float, help='Class confidence')
     parser.add_argument('--nms', default=0.6, type=float, help='Enter nms IOU threshold')
-    parser.add_argument('--save', '-s', type=str, default=False, help='Set true to save results. This flag is invalid when using camera.')
+    parser.add_argument('--save', '-s', type=str2bool, default=False, help='Set true to save results. This flag is invalid when using camera.')
+    parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.')
     args = parser.parse_args()
-    model_net = NanoDet(modelPath= args.model ,prob_threshold=args.confidence, iou_threshold=args.nms)
 
-    if (args.input_type=="image"):
-        image = cv2.imread(args.image_path)
-        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+    model = NanoDet(modelPath= args.model,
+                    prob_threshold=args.confidence,
+                    iou_threshold=args.nms,
+                    backend_id=args.backend,
+                    target_id=args.target)
+
+    tm = cv2.TickMeter()
+    tm.reset()
+    if args.input is not None:
+        image = cv2.imread(args.input)
+        input_blob = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
 
-        a = time.time()
-        preds = model_net.infer(image)
-        b = time.time()
-        print('Inference_Time:'+str(b-a)+' secs')
+        # Letterbox transformation
+        input_blob, letterbox_scale = letterbox(input_blob)
 
-        srcimg = vis(preds, image)
+        # Inference
+        tm.start()
+        preds = model.infer(input_blob)
+        tm.stop()
+        print("Inference time: {:.2f} ms".format(tm.getTimeMilli()))
 
-        srcimg = cv2.cvtColor(srcimg, cv2.COLOR_BGR2RGB)
-        cv2.namedWindow(args.image_path, cv2.WINDOW_AUTOSIZE)
-        cv2.imshow(args.image_path, srcimg)
-        cv2.waitKey(0)
+        img = vis(preds, image, letterbox_scale)
 
         if args.save:
             print('Resutls saved to result.jpg\n')
-            cv2.imwrite('result.jpg', srcimg)
+            cv2.imwrite('result.jpg', img)
 
-    else:
-        print("Press 1 to stop video capture")
-        cap = cv2.VideoCapture(0)
-        tm = cv2.TickMeter()
-        frame_width = int(cap.get(3))
-        frame_height = int(cap.get(4))
-        size = (frame_width, frame_height)
-        total_frames = 0
+        if args.vis:
+            cv2.namedWindow(args.input, cv2.WINDOW_AUTOSIZE)
+            cv2.imshow(args.input, img)
+            cv2.waitKey(0)
 
-        if(args.save):
-            result = cv2.VideoWriter('Webcam_result.avi', cv2.VideoWriter_fourcc(*'MJPG'),10, size)
+    else:
+        print("Press any key to stop video capture")
+        deviceId = 0
+        cap = cv2.VideoCapture(deviceId)
 
         while cv2.waitKey(1) < 0:
             hasFrame, frame = cap.read()
             if not hasFrame:
                 print('No frames grabbed!')
                 break
 
-            frame = cv2.flip(frame, 1)
-            #frame = cv2.resize(frame, [args.width, args.height])
+            input_blob, letterbox_scale = letterbox(frame)
             # Inference
             tm.start()
-            preds = model_net.infer(frame)
+            preds = model.infer(input_blob)
             tm.stop()
 
-            srcimg = vis(preds, frame)
-
-            total_frames += 1
-            fps=tm.getFPS()
-
-            if fps > 0:
-                fps_label = "FPS: %.2f" % fps
-                cv2.putText(srcimg, fps_label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
-
-            cv2.imshow("output", srcimg)
-
-            if cv2.waitKey(1) < 0:
-                print("Stream terminated")
-                break
+            img = vis(preds, frame, letterbox_scale, fps=tm.getFPS())
 
-            if(args.save):
-                result.write(frame)
+            cv2.imshow("NanoDet Demo", img)
 
-        print("Total frames: " + str(total_frames))
+            tm.reset()