Tengine
Introduction
Tengine is a lite, high performance, modular inference engine for embedded device, which powered by OPEN AI LAB.
Quantization Scheme
Full 8-bit integer representation of model weights and computation.
Per-tensor quantization of all weights and activations.
Asymmetric quantization of all weights and activations in
Tengine_u8
mode.Quantization of input and de-quantization of output should be done manually.
More details can be found at https://github.com/OAID/Tengine
Deploy on Tengine
Requirements:
Deployment:
We provide the example to deploy the quantized model to Tengine with asymmetric quantization.
First export the quantized model to ONNX [mqbench_qmodel_for_tengine.onnx] and dump the quantization parameters [mqbench_qmodel_for_tengine.scale] for activations.
1python main.py -a [model_name] --resume [model_save_path] --deploy --backend tengine_u8
Second convert
.onnx
file into.tmfile
format supported by Tengine (https://tengine-docs.readthedocs.io/en/latest/user_guides/convert_tool.html).1tm_convert_tool -f onnx -m [mqbench_qmodel_for_tengine.onnx] -o [xxxx.tmfile]
Quantize
.tmfile
withmqbench_qmodel_for_tengine.scale
(ref: https://tengine-docs.readthedocs.io/en/latest/user_guides/quant_tool_uint8.html).1quant_tool_uint8 -m [xxx.tmfile] -o [xxxx_u8.tmfile] -i ./ -f [mqbench_qmodel_for_tengine.scale]
Validation with pytengine(optional).
1python eval_tengine.py --dataset [path to dataset] -m [xxxx_u8.tmfile]