OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference on a range of Intel® platforms from edge to cloud.

Quantization Scheme

  • Support of mixed-precision models where some layers can be kept in the floating-point precision.

  • Per-channel quantization of weights of Convolutional and Fully-Connected layers.

  • Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.

  • Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.

  • Non-unified quantization parameters for Eltwise and Concat operations.

  • Non-quantized network output, i.e. there are no quantization parameters for it.

More details can be found at

Deploy on OpenVINO


  • Install OpenVINO C++ SDK from Intel

  • Install OpenVINO Python SDK using command pip install openvino openvino-dev (optional)


  • Python tutorials (see MQBench github application/openvino_example.ipynb) are written for running on jupyter notebooks, including PTQ process and accuracy evaluation.

  • Convert PyTorch checkpoint to openvino_deploy_model.onnx:

    1from mqbench.convert_deploy import convert_deploy
    2input_dict = {'x': [1, 3, 224, 224]}
    3convert_deploy(model, BackendType.OPENVINO, input_dict, model_name = 'openvino')
  • Convert .onnx file to .xml format and .bin format (supported by OpenVINO):

     1# mo --help get more information or check the docs for openvino
     2mo --input_model ./openvino_deploy_model.onnx
     3# after exec prev line, you will get openvino_deploy_model.xml and openvino_deploy_model.bin
     4# benchmark test using one cpu
     5benchmark_app -m ./openvino_deploy_model.xml -nstream 1
     6# test result on  Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Model: resnet18
     7# Original top-1 accuracy: 69.758
     8# PTQ top-1  accuracy: 69.334
     9# deploy using openvino top-1  accuracy: 69.312
    10# cosine distance between torch model and openvino IR measured on last output:0.9975
    11# Benchmark Result
    12# Original Resnet18>> Count: 6959  iterations Duration: 60009.54 ms Latency: 8.71 ms Throughput: 115.96 FPS
    13# Quantized Version>> Count: 13094 iterations Duration: 60004.75 ms Latency: 4.44 ms Throughput: 218.22 FPS