Naive PTQ
MQBench provides a simple API for naive PTQ, learn our step-by-step instructions to quantize your model.
1. To begin with, let’s import MQBench and prepare FP32 model.
import torchvision.models as models # for example model
from mqbench.prepare_by_platform import prepare_by_platform # add quant nodes for specific Backend
from mqbench.prepare_by_platform import BackendType # contain various Backend, like TensorRT, NNIE, etc.
from mqbench.utils.state import enable_calibration # turn on calibration algorithm, determine scale, zero_point, etc.
from mqbench.utils.state import enable_quantization # turn on actually quantization, like FP32 -> INT8
from mqbench.convert_deploy import convert_deploy # remove quant nodes for deploy
model = models.__dict__["resnet18"](pretrained=True) # use vision pre-defined model
model.eval()
2. Choose your backend.
# backend options
backend = BackendType.Tensorrt
# backend = BackendType.SNPE
# backend = BackendType.PPLW8A16
# backend = BackendType.NNIE
# backend = BackendType.Vitis
# backend = BackendType.ONNX_QNN
# backend = BackendType.PPLCUDA
# backend = BackendType.OPENVINO
# backend = BackendType.Tengine_u8
# backend = BackendType.Tensorrt_NLP
3. The next step prepares to quantize the model.
model = prepare_by_platform(model, backend) #! line 1. trace model and add quant nodes for model on backend
enable_calibration(model) #! line 2. turn on calibration, ready for gathering data
# calibration loop
for i, batch in enumerate(data):
# do forward procedures
...
enable_quantization(model) #! line 3. turn on actually quantization, ready for simulating Backend inference
# evaluation loop
for i, batch in enumerate(data):
# do forward procedures
...
4. Export quantized model.
# define dummy data for model export.
input_shape={'data': [10, 3, 224, 224]}
convert_deploy(model, backend, input_shape) #! line 4. remove quant nodes, ready for deploying to real-world hardware
Now you know how to conduct naive PTQ with MQBench, if you want to know more about customize backend check Learn MQBench configuration.