Object detection with MQBench

This part, we introduce how to quantize an object detection model using MQBench.

Getting Started

1. Clone the repositories.

git clone https://github.com/ModelTC/MQBench.git
git clone https://github.com/ModelTC/EOD.git

2. Quantization aware training.

# Prepare your float pretrained model.
cd eod/scripts
# Follow the prompts to set config in train_quant.sh.
sh train_qat.sh

We have several examples of qat config in EOD repository:

For retinanet-tensorrt:

float pretrained config file: retinanet-r18-improve.yaml
qat config file: retinanet-r18-improve_quant_trt_qat.yaml

For yolox-tensorrt:

float pretrained config file: yolox_s_ret_a1_comloc.yaml
qat config file: yolox_s_ret_a1_comloc_quant_trt_qat.yaml

For yolox-vitis:

float pretrained config file: yolox_fpga.yaml
qat config file: yolox_fpga_quant_vitis_qat.yaml

Something import in config file:

deploy_backend: Choose your deploy backend supported in MQBench.

ptq_only: If True, only ptq will be executed. If False, qat will be executed after ptq calibration.

extra_qconfig_dict: Choose your quantization config supported in MQBench.

leaf_module: Prevent torch.fx tool entering the module.

extra_quantizer_dict: Add some qat modules.

resume_model: The path to your float pretrained model.

tocaffe_friendly: It is recommended to set it to true, which will affect the output onnx model.

3. Resume training during qat.

cd eod/scripts
# just set resume_model in config file to your model, we will do all the rest.
sh train_qat.sh

4. Evaluate your quantized model.

cd eod/scripts
# set resume_model in config file to your model
# add -e to train_qat.sh
sh train_qat.sh

5. Deploy.

cd eod/scripts
# Follow the prompts to set config in quant_deploy.sh.
sh qat_deploy.sh

Introduction of EOD-MQBench Project

Code related to quantization is in eod/tasks/quant.

When you set the runner type to quant in config file, QuantRunner will be executed in eod/tasks/quant/runner/quant_runner.py.

Firstly, build your float model in self.build_model().
Load your float pretrained model/quantized model in self.load_ckpt().
Use torch.fx to trace your model in self.quantize_model().
Set your optimization and lr scheduler in self.build_trainer().
Ptq and eval in self.calibrate()
Train in self.train()

Something important:

Your model should be split into network and post-processing. Fx should only trace the network.

Quantized model should be saved with the key of qat, as shown in self.save(). This will be used in self.resume_model_from_fp() and self.resume_model_from_quant().

We disable the ema in qat. If your ckpt has ema state, we will load ema state into model, as shown in self.load_ckpt().

Be careful when your quantized model has extra learnable parameters. You can check it in optimizer, such as eod/tasks/det/plugins/yolov5/utils/optimizer_helper.py. Lsq has been checked.