A Brief about How MQBench Supports Hardware and Software
MQBench is able to do quantization over many different backends and quantization algorithms, which relys on the independence of hardware(backends) and software(algorithms). To support a new backend, h/w and s/w should be added.
We provide a typical work-flow:
FakeQuantizeAffineto simulate the hardware behavior.
FakeQuantizeAffinewith needed infomation.
ModelQuantizerto insert quantize nodes into the model.
Add ONNX-Backend translator to deploy the output graph.
FakeQuantize, Observer and Quantization Scheme
When the model is at calib mode,
Observer collects needed statistics and
FakeQuantize does not quantize the input; when at quant mode,
FakeQuantize performs the quantized forward with the qparams calulated bt
To add a
FakeQuantizeAffine, refer to
Fucntion: Add a fake quantization function to describe the quantized forward. If the function is not differential, also describe the backward.
Autograd and Symbolic: Wrap the quantization forward/backward function with
torch.autograd.Function. To enable onnx export, a
symbolicfunction is also needed.
Inherit the QuantizeBase: Define the class which holds the
qparamsand performs the quantized forward. This class inherit
Add a Observer if needed: Inherit
ObserverBase, collect statistics in
forwardand get qparams in
Use it in prepare_by_platform
Import your class at the
Add your backend type into the Enumeration
Define its default scheme at
Add mappings to and
For now, we have provided some linear quantization affines/observers so it might reduce lots of work.
To prepare the model by platform, quantization nodes should be inserted by and only by
ModelQuantizer, which hides all details about the needed graph. After preparation, the model could be used to QAT or PTQ.
The ModelQuantizer will do:
Op fusion, like Conv2d+BN2d+ReLU -> ConvBnReLU2d.
QAT swap. Fuse the modules into QAT modules and insert quantize node for weights.
Insert quantize nodes after activation. Need to find whose output will be quantized in the backend.
The fusion is a torch API and you need to edit the torch default fuser method mappings and patterns because the torch fuser does not accept any other args.
If you add some new fusion patterns:
Add the fusion patterns and related fusion method into
mqbench.fuser_method_mapping. This will turn some op patterns into intrinsic fused modules. If you create new intrinsic modules, add them into
mqbench.nn.intrinsic. If you apply new fusion method, add them into
Add the mapping from intrinsic modules into qat modules. Define the modules at
Add all the metioned things to
NOTE: If the pattern only apply to certain backend, you need to update the default dicts at ModelQuantizer, rather than
Swap the intrinsic modules into qat ones. Add the mappings into
mqbench.fuser_method_mapping if it is a universal mapping. If not, into ModelQuantizer’s additional_qat_module_mappings.
We deploy bias-quant intrinsic and qat modules, so just refer to the
mqbench.nn to see how to add one by yourself.
Insert Quantize nodes
Usually the TensorRT’s ModelQuantizer is a good example, which quantizes all quantizable input tensors of module. If you need to quantize other nodes, just add them to the set with your own logic.
The deploy stage is to turn our torch-based model into ONNX and graphs based on backend. Typically, we will merge bn into conv/deconv/linear, convert the quantized model into onnx and finally remove all fake quantize op and deploy them. The functions are defined at
mqbench.convert_deploy and regisered by
Deploy stage might be the hardest part of the flow, for it takes care of all h/w details. The normal flow will be like:
No extra work needed for merging bn and conver onnx.
For removing fake_quantize and collecting qparams, you can check the
mqbench.deploy. There are examples for linear/logarithmic quantization and self-defined onnx runtime.
Also, there are platforms support standard onnx quantization. If your platform supports this, please export FakeQuantizeAffine into onnxruntime QDQOperators.  
If extra tools are needed to complete the translation, just integrate your tools into
mqbench.deploy. And there is also an example of Vitis-AI.