You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/toolchain/manual_4_bie.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,11 +26,12 @@ Args:
26
26
* percentile (float, optional): used under 'mmse' mode. The range to search. The larger the value, the larger the search range, the better the performance but the longer the simulation time. Defaults to 0.001,
27
27
* outlier_factor (float, optional): used under 'mmse' mode. The factor applied on outliers. For example, if clamping data is sensitive to your model, set outlier_factor to 2 or higher. Higher outlier_factor will reduce outlier removal by increasing range. Defaults to 1.0.
28
28
* percentage (float, optional): used under 'percentage' mode. Suggest to set value between 0.999 and 1.0. Use 1.0 for detection models. Defaults to 0.999.
29
-
* datapath_bitwidth_mode: choose from "int8"/"int16"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8.)
30
-
* weight_bitwidth_mode: choose from "int8"/"int16"/"int4"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "int4" is not supported in kdp720. "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8.)
31
-
* model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
32
-
* model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
33
-
* cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
29
+
* datapath_bitwidth_mode: choose from "int8"/"int16"/"mix balance"/"mix light"/"mixbw". ("int16" is not supported in kdp520. "mixbw", "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8. "mixbw" automatically select the best bitwidth for each layer.)
30
+
* weight_bitwidth_mode: choose from "int8"/"int16"/"int4"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "int4" is not supported in kdp720. "mixbw", "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8. "mixbw" automatically select the best bitwidth for each layer.)
31
+
* model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
32
+
* model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
33
+
* cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
34
+
* flops_ratio (float, optional): the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
34
35
* compiler_tiling (str, optional): could be "default" or "deep_search". Get a better image cut method through deep search, so as to improve the efficiency of our NPU. Defaults to "default".
35
36
* mode (int, optional): running mode for the analysis. Defaults to 1.
36
37
- 0: run ip_evaluator only. This mode will not output bie file.
Copy file name to clipboardExpand all lines: docs/toolchain/quantization/1.3_Optimizing_Quantization_Modes.md
+1-4Lines changed: 1 addition & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,7 @@ bie_path = km.analysis(
66
66
67
67
### 3.2.3 Use `mixbw` for Sensitivity-Guided Quantization
68
68
69
-
If `mix light` precision is insufficient, use `mixbw`. This mode analyzes Conv node sensitivity and automatically prioritizes 16-bit quantization for sensitive Conv layers. Control compute overhead with flops_ratio (default=0.2). `mixbw` mode may need more time and disk space to evaluate quant sensitivity, but its fps is still faster than all int16.
69
+
If `mix light` precision is insufficient, use `mixbw`. This mode analyzes Conv node sensitivity and automatically prioritizes 16-bit quantization for sensitive Conv layers. Control compute overhead with flops_ratio (default=0.2). `mixbw` mode may need more time and disk space to evaluate quant sensitivity, but its fps is still faster than all int16. When using `mixbw`, the `model_in_bitwidth_mode`, `model_out_bitwidth_mode`, and `cpu_node_bitwidth_mode` are always `int16` and are not changeable.
0 commit comments