PyTorch Transformer model Mobilebert-uncased for Natural Language Classification and Question Answering

This document describes evaluation of optimized checkpoints for transformer models Mobilebert-uncased for NL Classification and Question Answering tasks.

AIMET installation and setup

Please install and setup AIMET (Torch GPU variant) before proceeding further.

NOTE

All AIMET releases are available here: https://github.com/quic/aimet/releases
This model has been tested using AIMET version 1.27.0 (i.e. set release_tag="1.27.0" in the above instructions).
This model is compatible with the PyTorch GPU variant of AIMET (i.e. set AIMET_VARIANT="torch_gpu" in the above instructions).

Additional Setup Dependencies

pip install datasets==2.4.0
pip install transformers==4.11.3

Add AIMET Model Zoo to the PYTHONPATH

export PYTHONPATH=$PYTHONPATH:<path to parent of aimet_model_zoo_path>

Model checkpoint

Original full precision checkpoints without downstream training were downloaded through hugging face
[Full precision model with downstream training weight files] are automatically downloaded using evaluation script
[Quantization optimized model weight files] are automatically downloaded using evaluation script

Dataset

For NLP tasks, we use the General Language Understanding Evaluation (GLUE) benchmark dataset for evaluation.
For Question Answering tasks, we use the Stanford Question Answering Dataset (SQuAD) benchmark dataset for evaluation.
Dataset downloading is handled by evaluation script

Usage

To run evaluation with QuantSim for Natural Language Classifier tasks in AIMET, use the following

python mobilebert_quanteval.py \
        --model_config <MODEL_CONFIGURATION> \
        --per_device_eval_batch_size 4 \
        --output_dir <OUT_DIR> \

example

python mobilebert_quanteval.py --model_config mobilebert_w8a8_rte --per_device_eval_batch_size 4 --output_dir ./evaluation_result

supported values of model_config are "mobilebert_w8a8_rte","mobilebert_w8a8_stsb","mobilebert_w8a8_mrpc","mobilebert_w8a8_cola","mobilebert_w8a8_sst2","mobilebert_w8a8_qnli","mobilebert_w8a8_qqp","mobilebert_w8a8_mnli", "mobilebert_w8a8_squad", "mobilebert_w4a8_rte","mobilebert_w4a8_stsb","mobilebert_w4a8_mrpc","mobilebert_w4a8_cola","mobilebert_w4a8_sst2","mobilebert_w4a8_qnli","mobilebert_w4a8_qqp","mobilebert_w4a8_mnli", "mobilebert_w4a8_squad"

Quantization Configuration

The following configuration has been used for the above models for IN4/INT8 quantization:

Weight quantization: 4/8 bits, symmetric quantization
Bias parameters are not quantized
Activation quantization: 8 bits, asymmetric quantization
Model inputs are quantized
Different quantization scheme for different weight bithwidth and downstreaming tasks
Mask values of -6 was applied in attention layers
Quantization aware training (QAT) was used to obtain optimized quantized weights, detailed hyperparameters listed in Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort, "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP 2021.

QAT Configuration	CoLA	SST-2	MRPC	STS-B	QQP	MNLI	QNLI	RTE
W8A8	per-tensor, tf	per-channel, tf	per-channel, tf	per-channel, tf	per-channel, tf	per-tensor, tf_enhanced	per-tensor, tf_enhanced	per-channel, tf
W4A8	per-tensor, tf_enhanced	per-channel, tf_enhanced	per-channel, tf_enhanced	per-channel, tf_enhanced	per-tensor, tf_enhanced	per-tensor, tf_enhanced	per-tensor, tf_enhanced	per-channel, tf_enhanced

QAT Configuration
W8A8	per-channel, tf
W4A8	per-channel, range_learning_with_tf_enhanced_init

Results

Below are the results of the Pytorch transformer model MobileBert for GLUE dataset:

	CoLA (corr)	SST-2 (acc)	MRPC (f1)	STS-B (corr)	QQP (acc)	MNLI (acc)	QNLI (acc)	RTE (acc)	GLUE
FP32	51.48	91.60	85.86	88.22	90.66	83.54	91.18	68.60	81.27
W8A8	52.51	91.63	90.81	88.19	90.80	83.46	91.12	68.95	82.18
W4A8	50.34	91.28	87.61	87.30	90.48	82.90	89.42	68.23	80.95

	EM	F1
FP32	82.75	90.11
W8A8	81.96	89.41
W4A8	81.88	89.33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MobileBert.md

MobileBert.md

PyTorch Transformer model Mobilebert-uncased for Natural Language Classification and Question Answering

AIMET installation and setup

Additional Setup Dependencies

Add AIMET Model Zoo to the PYTHONPATH

Model checkpoint

Dataset

Usage

To run evaluation with QuantSim for Natural Language Classifier tasks in AIMET, use the following

Quantization Configuration

Results

Files

MobileBert.md

Latest commit

History

MobileBert.md

File metadata and controls

PyTorch Transformer model Mobilebert-uncased for Natural Language Classification and Question Answering

AIMET installation and setup

Additional Setup Dependencies

Add AIMET Model Zoo to the PYTHONPATH

Model checkpoint

Dataset

Usage

To run evaluation with QuantSim for Natural Language Classifier tasks in AIMET, use the following

Quantization Configuration

Results