NNIE
NNIE is a Neural Network Inference Engine of Hisilicon. It support INT8/INT16 quantization.
Quantization Scheme
8/16 bit per-layer logarithmic quantization.
The specific quantization formulation is:
where \(c\) is clipping range. \(2 ^ {\dfrac{z}{16}}\) is the smallest positive value that can be represented after quantization.
It represents the integer number in True Form format. The highest bit represents the sign and the rest represents the absolute value of the number.
Floating Numer |
Integer Number |
Hexadecimal |
Dequantized Floating Number |
---|---|---|---|
\(\bigg(- \infty, - 2 ^ {\dfrac{z + 126.5}{16}}\bigg]\) |
-127 |
0xFF |
\(- 2 ^ {\dfrac{z+127}{16}}\) |
… |
… |
… |
… |
\(\bigg(- 2 ^ {\dfrac{z + 2.5}{16}}, - 2 ^ {\dfrac{z + 1.5}{16}}\bigg]\) |
-2 |
0x82 |
\(- 2 ^ {\dfrac{z+2}{16}}\) |
\(\bigg(- 2 ^ {\dfrac{z + 1.5}{16}}, - 2 ^ {\dfrac{z + 1}{16} - 1}\bigg)\) |
-1 |
0x81 |
\(- 2 ^ {\dfrac{z+1}{16}}\) |
\(\bigg[- 2 ^ {\dfrac{z + 1}{16} - 1}, 2 ^ {\dfrac{z}{16} - 1}\bigg)\) |
-0 |
0x80 |
0 |
\(\bigg[2 ^ {\dfrac{z}{16} - 1}, 2 ^ {\dfrac{z + 0.5}{16}}\bigg)\) |
0 |
0x00 |
\(2 ^ {\dfrac{z}{16}}\) |
\(\bigg[2 ^ {\dfrac{z + 0.5}{16}}, 2 ^ {\dfrac{z + 1.5}{16}}\bigg)\) |
1 |
0x01 |
\(2 ^ {\dfrac{z+1}{16}}\) |
… |
… |
… |
… |
\(\bigg[2 ^ {\dfrac{z + 126.5}{16}}, + \infty\bigg)\) |
127 |
0x7F |
\(2 ^ {\dfrac{z+127}{16}}\) |
NNIE performs a per-layer quantization, which means the inputs of the same layer share the same \(z_a\) and the weights of the same layer share the same \(z_w\).
In fact, when building engine using the official tool of NNIE, it requires the clipping value \(c\) rather than \(z\). \(c\) needs to be a number in the ‘gfpq_param_table_8bit.txt’ which ensures that \(16 * \log_2{c}\) is an integer.
Attention
Pooling: ceil_mode = True
Avoid using depthwise convolution.
Only support 2x nearest neighbor upsample.
For Detection task, you’d better choose RetinaNet structure.