Skip to content

Commit 05acc02

Browse files
authored
Merge pull request #68 from kneron/rel_0.31.0
Toolchain v0.31.0 release.
2 parents 8fe1ad0 + 88ee503 commit 05acc02

File tree

5 files changed

+139
-52
lines changed

5 files changed

+139
-52
lines changed

docs/toolchain/appendix/app_flow_manual.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Kneron End to End Simulator v0.30.0
1+
# Kneron End to End Simulator v0.31.0
22

33
This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.
44

docs/toolchain/appendix/history.md

Lines changed: 61 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -24,55 +24,75 @@
2424

2525
## Toolchain Change log
2626

27+
* **[v0.31.0]**
28+
* **Introduce `quan_config` for `ModelConfig.analysis` for more detailed quantization configuration.**
29+
* **Add `ktc.opt_and_eval` command for quick onnx optimization and evaluation.**
30+
* **Remove deprecated `compilerIpevaluator_730.sh` and add warning messages to other depecated scritps.**
31+
* Add `compiler_tiling` option for IP evaluator.
32+
* Add `--clear-shapes` and `--replace-avgpool-with-conv` flags to kneronnxopt.
33+
* Add `--seperate` flag to kneronnxopt.onnx_vs_onnx for detailed output comparison.
34+
* Update knerex shared weight combination logic.
35+
* Update knerex and dynasty to support empty Constant nodes.
36+
* Update compiler for better message logging.
37+
* Update dynasty and compiler for `softmax` support.
38+
* Update regression for longer timeout setting.
39+
* Improve `model_fx_report.html` readability.
40+
* Speed up compilerfor large model.
41+
* Fix the ktc error message for not supported special characters in model path.
42+
* Fix the ktc bug that logging module not imported.
43+
* Fix the kneronnxopt bug that flip nodes are eliminated incorrectly.
44+
* Fix the kneronnxopt bug that replacing Add/Sub/Mul/Div with BatchNormalization node incorrectly.
45+
* Fix the nef utility bug that 520 nef combination generates invalid nef files.
46+
* Other bug fixes and performance improvements.
2747
* **[v0.30.0]**
28-
* **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
29-
* **`bie` files may not be compatible with previous versions.**
30-
* Fix kneronnxopt to duplicate shared weights for not supported cases.
31-
* Update knerex to support alpha&beta hardsigmoid.
32-
* Update webgui to support conda environment selection.
33-
* Bug fixes and performance improvements.
48+
* **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
49+
* **`bie` files may not be compatible with previous versions.**
50+
* Fix kneronnxopt to duplicate shared weights for not supported cases.
51+
* Update knerex to support alpha&beta hardsigmoid.
52+
* Update webgui to support conda environment selection.
53+
* Bug fixes and performance improvements.
3454
* **[v0.29.0]**
35-
* **Introduce `mixbw` for fixed-point analysis, an automated quantization mode that optimizes 8/16-bit configurations for Conv nodes, balancing accuracy (SNR) and speed (FPS).**
36-
* Add onnx_vs_onnx command line entrance for kneronnxopt to compare two onnx models.
37-
* Optimize log printing for ktc.
38-
* Optimize compiler runtime based on partial graph comparison.
39-
* Fix the bug that knerex could not handle last nodes properly in some cases.
40-
* Fix the bug that knerex could not handle Add constant node input properly.
41-
* Fix other known bugs.
55+
* **Introduce `mixbw` for fixed-point analysis, an automated quantization mode that optimizes 8/16-bit configurations for Conv nodes, balancing accuracy (SNR) and speed (FPS).**
56+
* Add onnx_vs_onnx command line entrance for kneronnxopt to compare two onnx models.
57+
* Optimize log printing for ktc.
58+
* Optimize compiler runtime based on partial graph comparison.
59+
* Fix the bug that knerex could not handle last nodes properly in some cases.
60+
* Fix the bug that knerex could not handle Add constant node input properly.
61+
* Fix other known bugs.
4262
* **[v0.28.2]**
43-
* Fix the batch compiler bug that nef files do not contain version information.
44-
* Optimize kneronnxopt for processing large models.
45-
* Fix other bugs.
63+
* Fix the batch compiler bug that nef files do not contain version information.
64+
* Optimize kneronnxopt for processing large models.
65+
* Fix other bugs.
4666
* **[v0.28.1]**
47-
* Change default miniconda channel due to the license issue.
67+
* Change default miniconda channel due to the license issue.
4868
* **[v0.28.0]**
49-
* **Change conda environment due to license issue.**
50-
* **Remove caffe support.**
51-
* Add `--opt-matmul` flag to kneronnxopt for kneron hardware matmul optimization.
52-
* Add `--overwrite-input-shapes` and `--skip-fuse-qkv` flags to kneronnxopt large model processing.
53-
* Support GRU, LSTM, and RNN operators defusion in kneronnxopt.
54-
* Fix bugs.
69+
* **Change conda environment due to license issue.**
70+
* **Remove caffe support.**
71+
* Add `--opt-matmul` flag to kneronnxopt for kneron hardware matmul optimization.
72+
* Add `--overwrite-input-shapes` and `--skip-fuse-qkv` flags to kneronnxopt large model processing.
73+
* Support GRU, LSTM, and RNN operators defusion in kneronnxopt.
74+
* Fix bugs.
5575
* **[v0.27.0]**
56-
* Adjust batch compiler internal behavior to improve robustness.
57-
* Optimize compiler to improve feature map cut search speed.
58-
* Optimize data converter to improve speed.
59-
* Fix bugs.
76+
* Adjust batch compiler internal behavior to improve robustness.
77+
* Optimize compiler to improve feature map cut search speed.
78+
* Optimize data converter to improve speed.
79+
* Fix bugs.
6080
* **[v0.26.0]**
61-
* Optimize compiler for 730 graph cutting.
62-
* Supports the flash attention model.
63-
* Add producer name in kneronnxopt.
64-
* Fix bugs.
81+
* Optimize compiler for 730 graph cutting.
82+
* Supports the flash attention model.
83+
* Add producer name in kneronnxopt.
84+
* Fix bugs.
6585
* **[v0.25.1]**
66-
* ktc supports non-str platform conversion.
67-
* Fix kneronnxopt argument name.
68-
* Update conda environment
69-
* base:
70-
- numpy-1.21.0
71-
- pandas-1.2.0
72-
* onnx1.13:
73-
- numpy-1.26.4
74-
- pandas-2.2.2
75-
* Fix bugs.
86+
* ktc supports non-str platform conversion.
87+
* Fix kneronnxopt argument name.
88+
* Update conda environment
89+
* base:
90+
- numpy-1.21.0
91+
- pandas-1.2.0
92+
* onnx1.13:
93+
- numpy-1.26.4
94+
- pandas-2.2.2
95+
* Fix bugs.
7696
* **[v0.25.0]**
7797
* **IP evaluator add arguments `weight_bandwidth` and `dma_bandwidth`.**
7898
* 730 toolchain full upgrade.

docs/toolchain/manual_1_overview.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
# 1. Toolchain Overview
66

7-
**2025 Jun**
8-
**Toolchain v0.30.0**
7+
**2025 Sep**
8+
**Toolchain v0.31.0**
99

1010
## 1.1. Introduction
1111

@@ -19,13 +19,26 @@ In this document, you'll learn:
1919
3. How to utilize the tools through Python API.
2020

2121
**Major changes of the current version**
22-
* **[v0.30.0]**
23-
* **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
24-
* **`bie` files may not be compatible with previous versions.**
25-
* Fix kneronnxopt to duplicate shared weights for not supported cases.
26-
* Update knerex to support alpha&beta hardsigmoid.
27-
* Update webgui to support conda environment selection.
28-
* Bug fixes and performance improvements.
22+
* **[v0.31.0]**
23+
* **Introduce `quan_config` for `ModelConfig.analysis` for more detailed quantization configuration.**
24+
* **Add `ktc.opt_and_eval` command for quick onnx optimization and evaluation.**
25+
* **Remove deprecated `compilerIpevaluator_730.sh` and add warning messages to other depecated scritps.**
26+
* Add `compiler_tiling` option for IP evaluator.
27+
* Add `--clear-shapes` and `--replace-avgpool-with-conv` flags to kneronnxopt.
28+
* Add `--seperate` flag to kneronnxopt.onnx_vs_onnx for detailed output comparison.
29+
* Update knerex shared weight combination logic.
30+
* Update knerex and dynasty to support empty Constant nodes.
31+
* Update compiler for better message logging.
32+
* Update dynasty and compiler for `softmax` support.
33+
* Update regression for longer timeout setting.
34+
* Improve `model_fx_report.html` readability.
35+
* Speed up compilerfor large model.
36+
* Fix the ktc error message for not supported special characters in model path.
37+
* Fix the ktc bug that logging module not imported.
38+
* Fix the kneronnxopt bug that flip nodes are eliminated incorrectly.
39+
* Fix the kneronnxopt bug that replacing Add/Sub/Mul/Div with BatchNormalization node incorrectly.
40+
* Fix the nef utility bug that 520 nef combination generates invalid nef files.
41+
* Other bug fixes and performance improvements.
2942

3043
## 1.2. Workflow Overview
3144

docs/toolchain/manual_3_onnx.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,8 @@ By the way, to save the model, you can use the following function from the onnx
7878
onnx.save(optimized_m, '/data1/optimized.onnx')
7979
```
8080

81+
We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.
82+
8183
### 3.1.3. ONNX Editing
8284

8385
KL520/KL720/KL530 NPU supports most of the compute extensive OPs, such as Conv, BatchNormalization, Fully Connect/GEMM, in order to speed up the model inference run time. On the other hand, there are some OPs that KL520 NPU cannot support well, such as `Softmax` or `Sigmod`. However, these OPs usually are not compute extensive and they are better to execute in CPU.
@@ -187,6 +189,8 @@ Please check the report to see if the performance meets your expectation. Please
187189
> You can find the profiling configuration under `/workspace/scripts/res`. The configuration files are named like `ip_config_<platform>.json`. You can change the
188190
> bandwidth according to your scenario .
189191
192+
We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.
193+
190194
## 3.3. E2E Simulator Check (Floating Point)
191195

192196
Before going into the next section of quantization, we need to ensure the optimized onnx file can produce the same result as the originally designed model.
@@ -275,3 +279,42 @@ ASSERT img_data.shape == (224, 224, 3)
275279
new_img_data = ktc.convert_channel_last_to_first(img_data)
276280
ASSERT new_img_data.shape == (1, 3, 224, 224)
277281
```
282+
283+
### 3.4.4 Is there any command line tool instead of Python API for quick model optimization or evaluation?
284+
285+
Yes, we provide a command line tool for both model optimization and evaluation. The tool is called `ktc.opt_and_eval`.
286+
287+
Here is an example command line usage for model optimization and evaluation:
288+
289+
```bash
290+
python -m ktc.opt_and_eval 730 /workspace/examples/mobilenetv2/mobilenetv2_zeroq.origin.onnx
291+
```
292+
293+
The first parameter is the platform, which should be one of "520", "720", "530", "630", "730". The second parameter is the path to the input onnx file. The optimized onnx file would be saved in the same folder as the input onnx file with suffix `.opt.onnx` by default. The evaluation report would be saved in `/data/kneron_flow ` with the filename `model_fx_report.html` by default.
294+
295+
You can use `-o` or `--optimizer-only` to only run the optimization step without evaluation. You can use `-e` or `--evaluator-only` to only run the evaluation step without optimization. It also supports bie file as input which only runs the evaluation step.
296+
297+
**This script is only a quick tool for simple model optimization and evaluation. For advanced usage, please use the Python API.**
298+
299+
You can use `-h` or `--help` to see all the options.
300+
301+
```
302+
usage: python -m ktc.opt_and_eval [-h] [-e] [-E EVALUATOR_REPORT_PATH] [-o] [-O OPTIMIZED_PATH] [--deep-search] {520,720,530,630,730} path
303+
304+
Optimize ONNX model and run IP Evaluator
305+
306+
positional arguments:
307+
{520,720,530,630,730}
308+
Target hardware platform.
309+
path Path to the ONNX/BIE model file.
310+
311+
optional arguments:
312+
-h, --help show this help message and exit
313+
-e, --evaluator-only Evaluator only, skip optimization step.
314+
-E EVALUATOR_REPORT_PATH, --evaluator-report-path EVALUATOR_REPORT_PATH
315+
Path to the directory to save the evaluator report.
316+
-o, --optimizer-only Optimizer only, skip evaluator step.
317+
-O OPTIMIZED_PATH, --optimized-path OPTIMIZED_PATH
318+
Path to save the optimized ONNX model.
319+
--deep-search Use deep search for optimization, which may take longer but can yield better performance.
320+
```

docs/toolchain/manual_4_bie.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,18 @@ Args:
3131
* model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
3232
* model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
3333
* cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
34-
* flops_ratio (float, optional): the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
34+
* flops_ratio (float, optional): used under 'mixbw' mode to set the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
35+
* quan_config (Dict, optional): Supported on KDP730. Allows manually setting output bitwidth for specific node by name to override automatic selection. e.g: {"523_kn": {
36+
"bitwidth": {
37+
"all": 8
38+
}
39+
},
40+
"510_kn": {
41+
"bitwidth": {
42+
"all": 15
43+
}
44+
}}
45+
3546
* compiler_tiling (str, optional): could be "default" or "deep_search". Get a better image cut method through deep search, so as to improve the efficiency of our NPU. Defaults to "default".
3647
* mode (int, optional): running mode for the analysis. Defaults to 1.
3748
- 0: run ip_evaluator only. This mode will not output bie file.

0 commit comments

Comments
 (0)