Skip to content

Commit 72e9eea

Browse files
MrWhoamiCopilot
andauthored
Added description for limitation of the mixbw mode. (#62)
Co-authored-by: Copilot <[email protected]>
1 parent 20526d3 commit 72e9eea

File tree

2 files changed

+7
-9
lines changed

2 files changed

+7
-9
lines changed

docs/toolchain/manual_4_bie.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,12 @@ Args:
2626
* percentile (float, optional): used under 'mmse' mode. The range to search. The larger the value, the larger the search range, the better the performance but the longer the simulation time. Defaults to 0.001,
2727
* outlier_factor (float, optional): used under 'mmse' mode. The factor applied on outliers. For example, if clamping data is sensitive to your model, set outlier_factor to 2 or higher. Higher outlier_factor will reduce outlier removal by increasing range. Defaults to 1.0.
2828
* percentage (float, optional): used under 'percentage' mode. Suggest to set value between 0.999 and 1.0. Use 1.0 for detection models. Defaults to 0.999.
29-
* datapath_bitwidth_mode: choose from "int8"/"int16"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8.)
30-
* weight_bitwidth_mode: choose from "int8"/"int16"/"int4"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "int4" is not supported in kdp720. "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8.)
31-
* model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
32-
* model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
33-
* cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520.)
29+
* datapath_bitwidth_mode: choose from "int8"/"int16"/"mix balance"/"mix light"/"mixbw". ("int16" is not supported in kdp520. "mixbw", "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8. "mixbw" automatically select the best bitwidth for each layer.)
30+
* weight_bitwidth_mode: choose from "int8"/"int16"/"int4"/"mix balance"/"mix light". ("int16" is not supported in kdp520. "int4" is not supported in kdp720. "mixbw", "mix balance" and "mix light" are combines of int8 and int16 mode. "mix balance" prefers int16 while "mix light" prefers int8. "mixbw" automatically select the best bitwidth for each layer.)
31+
* model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
32+
* model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
33+
* cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
34+
* flops_ratio (float, optional): the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
3435
* compiler_tiling (str, optional): could be "default" or "deep_search". Get a better image cut method through deep search, so as to improve the efficiency of our NPU. Defaults to "default".
3536
* mode (int, optional): running mode for the analysis. Defaults to 1.
3637
- 0: run ip_evaluator only. This mode will not output bie file.

docs/toolchain/quantization/1.3_Optimizing_Quantization_Modes.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ bie_path = km.analysis(
6666

6767
### 3.2.3 Use `mixbw` for Sensitivity-Guided Quantization
6868

69-
If `mix light` precision is insufficient, use `mixbw`. This mode analyzes Conv node sensitivity and automatically prioritizes 16-bit quantization for sensitive Conv layers. Control compute overhead with flops_ratio (default=0.2). `mixbw` mode may need more time and disk space to evaluate quant sensitivity, but its fps is still faster than all int16.
69+
If `mix light` precision is insufficient, use `mixbw`. This mode analyzes Conv node sensitivity and automatically prioritizes 16-bit quantization for sensitive Conv layers. Control compute overhead with flops_ratio (default=0.2). `mixbw` mode may need more time and disk space to evaluate quant sensitivity, but its fps is still faster than all int16. When using `mixbw`, the `model_in_bitwidth_mode`, `model_out_bitwidth_mode`, and `cpu_node_bitwidth_mode` are always `int16` and are not changeable.
7070

7171

7272
```python
@@ -76,9 +76,6 @@ bie_path = km.analysis(
7676
output_dir="/data1/kneron_flow",
7777
datapath_bitwidth_mode="mixbw",
7878
weight_bitwidth_mode="mixbw",
79-
model_in_bitwidth_mode="int16",
80-
model_out_bitwidth_mode="int16",
81-
cpu_node_bitwidth_mode="int16",
8279
flops_ratio=0.2
8380
)
8481
```

0 commit comments

Comments
 (0)