Merge pull request #68 from kneron/rel_0.31.0

MrWhoami · web-flow · commit 05acc02fccf2 · 2025-09-19T00:33:24.000+08:00
Toolchain v0.31.0 release.
diff --git a/docs/toolchain/appendix/app_flow_manual.md b/docs/toolchain/appendix/app_flow_manual.md
@@ -1,4 +1,4 @@
-# Kneron End to End Simulator v0.30.0
+# Kneron End to End Simulator v0.31.0
 
 This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.
 
diff --git a/docs/toolchain/appendix/history.md b/docs/toolchain/appendix/history.md
@@ -24,55 +24,75 @@
 
 ## Toolchain Change log
 
+* **[v0.31.0]**
+    * **Introduce `quan_config` for `ModelConfig.analysis` for more detailed quantization configuration.**
+    * **Add `ktc.opt_and_eval` command for quick onnx optimization and evaluation.**
+    * **Remove deprecated `compilerIpevaluator_730.sh` and add warning messages to other depecated scritps.**
+    * Add `compiler_tiling` option for IP evaluator.
+    * Add `--clear-shapes` and `--replace-avgpool-with-conv` flags to kneronnxopt.
+    * Add `--seperate` flag to kneronnxopt.onnx_vs_onnx for detailed output comparison.
+    * Update knerex shared weight combination logic.
+    * Update knerex and dynasty to support empty Constant nodes.
+    * Update compiler for better message logging.
+    * Update dynasty and compiler for `softmax` support.
+    * Update regression for longer timeout setting.
+    * Improve `model_fx_report.html` readability.
+    * Speed up compilerfor large model.
+    * Fix the ktc error message for not supported special characters in model path.
+    * Fix the ktc bug that logging module not imported.
+    * Fix the kneronnxopt bug that flip nodes are eliminated incorrectly.
+    * Fix the kneronnxopt bug that replacing Add/Sub/Mul/Div with BatchNormalization node incorrectly.
+    * Fix the nef utility bug that 520 nef combination generates invalid nef files.
+    * Other bug fixes and performance improvements.
 * **[v0.30.0]**
-  * **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
-  * **`bie` files may not be compatible with previous versions.**
-  * Fix kneronnxopt to duplicate shared weights for not supported cases.
-  * Update knerex to support alpha&beta hardsigmoid.
-  * Update webgui to support conda environment selection.
-  * Bug fixes and performance improvements.
+    * **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
+    * **`bie` files may not be compatible with previous versions.**
+    * Fix kneronnxopt to duplicate shared weights for not supported cases.
+    * Update knerex to support alpha&beta hardsigmoid.
+    * Update webgui to support conda environment selection.
+    * Bug fixes and performance improvements.
 * **[v0.29.0]**
-  * **Introduce `mixbw` for fixed-point analysis, an automated quantization mode that optimizes 8/16-bit configurations for Conv nodes, balancing accuracy (SNR) and speed (FPS).**
-  * Add onnx_vs_onnx command line entrance for kneronnxopt to compare two onnx models.
-  * Optimize log printing for ktc.
-  * Optimize compiler runtime based on partial graph comparison.
-  * Fix the bug that knerex could not handle last nodes properly in some cases.
-  * Fix the bug that knerex could not handle Add constant node input properly.
-  * Fix other known bugs.
+    * **Introduce `mixbw` for fixed-point analysis, an automated quantization mode that optimizes 8/16-bit configurations for Conv nodes, balancing accuracy (SNR) and speed   (FPS).**
+    * Add onnx_vs_onnx command line entrance for kneronnxopt to compare two onnx models.
+    * Optimize log printing for ktc.
+    * Optimize compiler runtime based on partial graph comparison.
+    * Fix the bug that knerex could not handle last nodes properly in some cases.
+    * Fix the bug that knerex could not handle Add constant node input properly.
+    * Fix other known bugs.
 * **[v0.28.2]**
-  * Fix the batch compiler bug that nef files do not contain version information.
-  * Optimize kneronnxopt for processing large models.
-  * Fix other bugs.
+    * Fix the batch compiler bug that nef files do not contain version information.
+    * Optimize kneronnxopt for processing large models.
+    * Fix other bugs.
 * **[v0.28.1]**
-  * Change default miniconda channel due to the license issue.
+    * Change default miniconda channel due to the license issue.
 * **[v0.28.0]**
-  * **Change conda environment due to license issue.**
-  * **Remove caffe support.**
-  * Add `--opt-matmul` flag to kneronnxopt for kneron hardware matmul optimization.
-  * Add `--overwrite-input-shapes` and `--skip-fuse-qkv` flags to kneronnxopt large model processing.
-  * Support GRU, LSTM, and RNN operators defusion in kneronnxopt.
-  * Fix bugs.
+    * **Change conda environment due to license issue.**
+    * **Remove caffe support.**
+    * Add `--opt-matmul` flag to kneronnxopt for kneron hardware matmul optimization.
+    * Add `--overwrite-input-shapes` and `--skip-fuse-qkv` flags to kneronnxopt large model processing.
+    * Support GRU, LSTM, and RNN operators defusion in kneronnxopt.
+    * Fix bugs.
 * **[v0.27.0]**
-  * Adjust batch compiler internal behavior to improve robustness.
-  * Optimize compiler to improve feature map cut search speed.
-  * Optimize data converter to improve speed.
-  * Fix bugs.
+    * Adjust batch compiler internal behavior to improve robustness.
+    * Optimize compiler to improve feature map cut search speed.
+    * Optimize data converter to improve speed.
+    * Fix bugs.
 * **[v0.26.0]**
-  * Optimize compiler for 730 graph cutting.
-  * Supports the flash attention model.
-  * Add producer name in kneronnxopt.
-  * Fix bugs.
+    * Optimize compiler for 730 graph cutting.
+    * Supports the flash attention model.
+    * Add producer name in kneronnxopt.
+    * Fix bugs.
 * **[v0.25.1]**
-  * ktc supports non-str platform conversion.
-  * Fix kneronnxopt argument name.
-  * Update conda environment
-    * base:
-      - numpy-1.21.0
-      - pandas-1.2.0
-    * onnx1.13:
-      - numpy-1.26.4
-      - pandas-2.2.2
-  * Fix bugs.
+    * ktc supports non-str platform conversion.
+    * Fix kneronnxopt argument name.
+    * Update conda environment
+        * base:
+            - numpy-1.21.0
+            - pandas-1.2.0
+        * onnx1.13:
+            - numpy-1.26.4
+            - pandas-2.2.2
+    * Fix bugs.
 * **[v0.25.0]**
     * **IP evaluator add arguments `weight_bandwidth` and `dma_bandwidth`.**
     * 730 toolchain full upgrade.
diff --git a/docs/toolchain/manual_1_overview.md b/docs/toolchain/manual_1_overview.md
@@ -4,8 +4,8 @@
 
 # 1. Toolchain Overview
 
-**2025 Jun**
-**Toolchain v0.30.0**
+**2025 Sep**
+**Toolchain v0.31.0**
 
 ## 1.1. Introduction
 
@@ -19,13 +19,26 @@ In this document, you'll learn:
 3. How to utilize the tools through Python API.
 
 **Major changes of the current version**
-* **[v0.30.0]**
-  * **Introduce `input_fmt` for `ModelConfig` to specify the input format of the model.**
-  * **`bie` files may not be compatible with previous versions.**
-  * Fix kneronnxopt to duplicate shared weights for not supported cases.
-  * Update knerex to support alpha&beta hardsigmoid.
-  * Update webgui to support conda environment selection.
-  * Bug fixes and performance improvements.
+* **[v0.31.0]**
+    * **Introduce `quan_config` for `ModelConfig.analysis` for more detailed quantization configuration.**
+    * **Add `ktc.opt_and_eval` command for quick onnx optimization and evaluation.**
+    * **Remove deprecated `compilerIpevaluator_730.sh` and add warning messages to other depecated scritps.**
+    * Add `compiler_tiling` option for IP evaluator.
+    * Add `--clear-shapes` and `--replace-avgpool-with-conv` flags to kneronnxopt.
+    * Add `--seperate` flag to kneronnxopt.onnx_vs_onnx for detailed output comparison.
+    * Update knerex shared weight combination logic.
+    * Update knerex and dynasty to support empty Constant nodes.
+    * Update compiler for better message logging.
+    * Update dynasty and compiler for `softmax` support.
+    * Update regression for longer timeout setting.
+    * Improve `model_fx_report.html` readability.
+    * Speed up compilerfor large model.
+    * Fix the ktc error message for not supported special characters in model path.
+    * Fix the ktc bug that logging module not imported.
+    * Fix the kneronnxopt bug that flip nodes are eliminated incorrectly.
+    * Fix the kneronnxopt bug that replacing Add/Sub/Mul/Div with BatchNormalization node incorrectly.
+    * Fix the nef utility bug that 520 nef combination generates invalid nef files.
+    * Other bug fixes and performance improvements.
 
 ## 1.2. Workflow Overview
 
diff --git a/docs/toolchain/manual_3_onnx.md b/docs/toolchain/manual_3_onnx.md
@@ -78,6 +78,8 @@ By the way, to save the model, you can use the following function from the onnx
 onnx.save(optimized_m, '/data1/optimized.onnx')
 ```
 
+We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.
+
 ### 3.1.3. ONNX Editing
 
 KL520/KL720/KL530 NPU supports most of the compute extensive OPs, such as Conv, BatchNormalization, Fully Connect/GEMM, in order to speed up the model inference run time. On the other hand, there are some OPs that KL520 NPU cannot support well, such as `Softmax` or `Sigmod`. However, these OPs usually are not compute extensive and they are better to execute in CPU.
@@ -187,6 +189,8 @@ Please check the report to see if the performance meets your expectation. Please
 > You can find the profiling configuration under `/workspace/scripts/res`. The configuration files are named like `ip_config_<platform>.json`. You can change the
 > bandwidth according to your scenario .
 
+We also provide a command line tool for both model optimization and evaluation. Please check FAQ 3.4.4 for details.
+
 ## 3.3. E2E Simulator Check (Floating Point)
 
 Before going into the next section of quantization, we need to ensure the optimized onnx file can produce the same result as the originally designed model.
@@ -275,3 +279,42 @@ ASSERT img_data.shape == (224, 224, 3)
 new_img_data = ktc.convert_channel_last_to_first(img_data)
 ASSERT new_img_data.shape == (1, 3, 224, 224)
 ```
+
+### 3.4.4 Is there any command line tool instead of Python API for quick model optimization or evaluation?
+
+Yes, we provide a command line tool for both model optimization and evaluation. The tool is called `ktc.opt_and_eval`.
+
+Here is an example command line usage for model optimization and evaluation:
+
+```bash
+python -m ktc.opt_and_eval 730 /workspace/examples/mobilenetv2/mobilenetv2_zeroq.origin.onnx
+```
+
+The first parameter is the platform, which should be one of "520", "720", "530", "630", "730". The second parameter is the path to the input onnx file. The optimized onnx file would be saved in the same folder as the input onnx file with suffix `.opt.onnx` by default. The evaluation report would be saved in `/data/kneron_flow ` with the filename `model_fx_report.html` by default.
+
+You can use `-o` or `--optimizer-only` to only run the optimization step without evaluation. You can use `-e` or `--evaluator-only` to only run the evaluation step without optimization. It also supports bie file as input which only runs the evaluation step.
+
+**This script is only a quick tool for simple model optimization and evaluation. For advanced usage, please use the Python API.**
+
+You can use `-h` or `--help` to see all the options.
+
+```
+usage: python -m ktc.opt_and_eval [-h] [-e] [-E EVALUATOR_REPORT_PATH] [-o] [-O OPTIMIZED_PATH] [--deep-search] {520,720,530,630,730} path
+
+Optimize ONNX model and run IP Evaluator
+
+positional arguments:
+  {520,720,530,630,730}
+                        Target hardware platform.
+  path                  Path to the ONNX/BIE model file.
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -e, --evaluator-only  Evaluator only, skip optimization step.
+  -E EVALUATOR_REPORT_PATH, --evaluator-report-path EVALUATOR_REPORT_PATH
+                        Path to the directory to save the evaluator report.
+  -o, --optimizer-only  Optimizer only, skip evaluator step.
+  -O OPTIMIZED_PATH, --optimized-path OPTIMIZED_PATH
+                        Path to save the optimized ONNX model.
+  --deep-search         Use deep search for optimization, which may take longer but can yield better performance.
+```
diff --git a/docs/toolchain/manual_4_bie.md b/docs/toolchain/manual_4_bie.md
@@ -31,7 +31,18 @@ Args:
 * model_in_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
 * model_out_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
 * cpu_node_bitwidth_mode: choose from "int8"/"int16". ("int16" is not supported in kdp520. When "mixbw" is set, this parameter is ignored.)
-* flops_ratio (float, optional): the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
+* flops_ratio (float, optional): used under 'mixbw' mode to set the ratio of the flops of the model. The larger the value, the better the performance but the longer the simulation time. Defaults to 0.2.
+* quan_config (Dict, optional): Supported on KDP730. Allows manually setting output bitwidth for specific node by name to override automatic selection. e.g: {"523_kn": {
+        "bitwidth": {
+            "all": 8
+        }
+    },
+    "510_kn": {
+        "bitwidth": {
+            "all": 15
+        }
+    }}
+
 * compiler_tiling (str, optional): could be "default" or "deep_search". Get a better image cut method through deep search, so as to improve the efficiency of our NPU. Defaults to "default".
 * mode (int, optional): running mode for the analysis. Defaults to 1.
     - 0: run ip_evaluator only. This mode will not output bie file.

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Kneron End to End Simulator v0.30.0`
	`1`	`+# Kneron End to End Simulator v0.31.0`
`2`	`2`
`3`	`3`	`This project allows users to perform image inference using Kneron's built in simulator. We encourage users to use simply use the kneron_inference function to perform the tests on your inputs.`
`4`	`4`