Skip to content

Commit 773e566

Browse files
committed
Fix the grammatical.
1 parent 882e6f4 commit 773e566

File tree

2 files changed

+23
-28
lines changed

2 files changed

+23
-28
lines changed

doc/fluid/design/quantization/fixed_point_quantization.md

Lines changed: 23 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
Fixed-point quantization is to use lower bit, for example, 2 bit, 3 bit or 8 bit fixed-point to represent weights and activations, which usually are singe float point with 32 bit. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements. It is especially import for the inference in embedded device deployment.
1+
Fixed-point quantization uses lower bits, for example, 2-bit, 3-bit or 8-bit fixed point to represent weights and activations, which usually are in singe-precision float-point with 32 bits. The fixed-point representation has advantages in reducing memory bandwidth, lowering power consumption and computational resources as well as the model storage requirements. It is especially important for the inference in embedded-device deployment.
22

3-
According some experiments, the apporach to quantize the model trained in float point directly works sufficiently on the large model, like the over-parameterized VGG model. But the accuracy drops a lot for the small model. In order to improve the tradeoff be-tween accuracy and latency, many quantized training apporaches are proposed.
3+
According to some experiments, the apporach to quantize the model trained in float point directly works effectively on the large models, like the VGG model having many parameters. But the accuracy drops a lot for the small model. In order to improve the tradeoff between accuracy and latency, many quantized training apporaches are proposed.
44

5-
This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will describe how to the quantization range.
5+
This document is to design a quantized training framework on Fluid. The first part will introduce how to quantize, The second part will describe the quantized training framework. The last part will illustrate how to calculate the quantization scale.
66

77

88
### How to quantize
99

10-
There are many ways to quantizate the float value to fixed-point value. For example:
10+
There are many ways to quantize the float value to fixed-point value. For example:
1111

1212
$$ r = min(max(x, a), b)$$
1313
$$ s = \frac{b - a}{n - 1} $$
@@ -16,7 +16,7 @@ $$ q = \left \lfloor \frac{r - a}{s} \right \rceil $$
1616
where, $x$ is the float value to be quantized, $[a, b]$ is the quantization range, $a$ is the minimum value and $b$ is the maximal value. $\left \lfloor \right \rceil$ denotes rounding to the nearest integer. If the quantization level is $k$, $n$ is $2^k$, for example, $k$ is 8 and $n$ is 256. $q$ is the quantized integer.
1717

1818

19-
The quantization we apllied is parameterized by the number of quantization levels and maximum absolute value:
19+
The quantization we applied is parameterized by the number of quantization levels and maximum absolute value:
2020

2121
$$ M = max(abs(x)) $$
2222
$$ q = \left \lfloor \frac{x}{M} * (n - 1) \right \rceil $$
@@ -31,29 +31,28 @@ $q = scale * r + b$
3131
We call *min-max*, *max-abs* as the quantization arguments, also call them quantization scale or quantization range.
3232

3333

34-
How to calculate the quantization range (or maximum absolute value) for inference will be described in the last part.
34+
How to calculate the quantization scale (or maximum absolute value) for inference will be described in the last part.
3535

3636

3737
### Training Framework
3838

3939
#### Forward pass
4040

41-
The forward pass is simulated quantization, see the figure 1.
41+
The forward pass is simulated quantization, see Figure 1.
4242

4343
The training framework is as following figure.
4444

4545
<p align="center">
46-
<img src="quantization_forward.png" width="300" height="340" /><br/>
47-
48-
Fig 1. Forward in training with simulated quantization.
46+
<img src="quantization_forward.png" width="300" height="340"><br/>
47+
Figure 1. Forward in training with simulated quantization.
4948
</p>
5049

51-
- At first, both input and weight will be quantized to 8 bit.
52-
- Then, do the multiplication (or convolution) operation with integers.
53-
- Then, dequantize the multiplication (or convolution) results to 32 bit float point.
54-
- At last, do bias-addition in float type of 32 bit. Here, the bias is not quantized.
50+
- Firstly, both input and weight will be quantized to 8-bit integers.
51+
- Second, do the multiplication (or convolution) operation with integers.
52+
- Third, dequantize the multiplication (or convolution) results to 32-bit float point.
53+
- Finally, do bias-addition in float type of 32 bit. Here, the bias is not quantized.
5554

56-
For general matrix to matrix multiplication (GEMM), quantize for $X$ and $W$:
55+
For general matrix multiplication (GEMM), quantize for $X$ and $W$:
5756

5857
$$ X_q = \left \lfloor \frac{X}{X_m} * (n - 1) \right \rceil $$
5958
$$ W_q = \left \lfloor \frac{W}{W_m} * (n - 1) \right \rceil $$
@@ -76,34 +75,30 @@ $$
7675
From these formulas, dequantization also can be moved before GEMM, do dequantization for $Xq$ and $Wq$ at first, then do GEMM. The forward workflow in training is equivalent to following framework.
7776

7877
<p align="center">
79-
<img src="quantization_forward.png" width="300" height="330" /><br/>
80-
81-
Fig 2. Equitvalent forward in training with simulated quantization.
82-
78+
<img src="quantization_equivalent_forward.png" width="300" height="330"><br/>
79+
Figure 2. Equivalent forward in training with simulated quantization.
8380
</p>
8481

85-
We use this equivalent workflow in the training. In our desigin, there is a quantization transipler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`.
82+
We use this equivalent workflow in the training. In our desigin, there is a quantization transpiler to insert the quantization operator and the de-quantization operator in the Fluid `ProgramDesc`. Since the outputs of quantization and de-quantization operator are still in floating point, they are called faked quantization and de-quantization operator. And the training framework is called simulated quantization.
8683

8784
#### Backward pass
8885

89-
See the figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32 bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.
86+
See Figure 3. The gradients are calculated by dequantized weights and activations. All inputs and outputs are float point with 32-bit. And in the weight updating process, the gradients will be added to the original weight, not the quantized or dequantized weights.
9087

9188
<p align="center">
92-
<img src="quantization_backward_and_optimization.png" /><br/>
93-
94-
Fig 3. Backward and weight updating in training with simulated quantization.
95-
89+
<img src="quantization_backward_and_optimization.png"><br/>
90+
Figure 3. Backward and weight updating in training with simulated quantization.
9691
</p>
9792

9893
So the quantization transipler will change some inputs of the corresponding backward operators.
9994

10095
### How to calculate quantization scale
10196

102-
There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy is to calculate the quantization scale value each iteration. The static strategy is to fix the quantization scale for different inputs.
97+
There are two strategies to calculate quantization scale, we call them dynamic and static strategy. The dynamic strategy calculates the quantization scale value each iteration. The static strategy keeps the quantization scale for different inputs.
10398

104-
For weights, we apply the dynamic strategy for weights in the training, that is to say, the quantization scale will recalculate during each iteration until the traning is finished.
99+
For weights, we apply the dynamic strategy in the training, that is to say, the quantization scale will be recalculated during each iteration until the traning is finished.
105100

106-
For activations, the quantization scales are estimated during training, then use them in inference. There are several different ways to estimat:
101+
For activations, the quantization scales are estimated during training, then used in inference. There are several different ways to estimate them:
107102

108103

109104
1. Calculate the mean of maximum absolute during a window.
32.2 KB
Loading

0 commit comments

Comments
 (0)