Skip to content

Latest commit

 

History

History
executable file
·
183 lines (169 loc) · 5.79 KB

MobileBert.md

File metadata and controls

executable file
·
183 lines (169 loc) · 5.79 KB

PyTorch Transformer model Mobilebert-uncased for Natural Language Classification and Question Answering

This document describes evaluation of optimized checkpoints for transformer models Mobilebert-uncased for NL Classification and Question Answering tasks.

AIMET installation and setup

Please install and setup AIMET (Torch GPU variant) before proceeding further.

NOTE

  • All AIMET releases are available here: https://github.com/quic/aimet/releases
  • This model has been tested using AIMET version 1.27.0 (i.e. set release_tag="1.27.0" in the above instructions).
  • This model is compatible with the PyTorch GPU variant of AIMET (i.e. set AIMET_VARIANT="torch_gpu" in the above instructions).

Additional Setup Dependencies

pip install datasets==2.4.0
pip install transformers==4.11.3 

Add AIMET Model Zoo to the PYTHONPATH

export PYTHONPATH=$PYTHONPATH:<path to parent of aimet_model_zoo_path>

Model checkpoint

  • Original full precision checkpoints without downstream training were downloaded through hugging face
  • [Full precision model with downstream training weight files] are automatically downloaded using evaluation script
  • [Quantization optimized model weight files] are automatically downloaded using evaluation script

Dataset

Usage

To run evaluation with QuantSim for Natural Language Classifier tasks in AIMET, use the following

python mobilebert_quanteval.py \
        --model_config <MODEL_CONFIGURATION> \
        --per_device_eval_batch_size 4 \
        --output_dir <OUT_DIR> \
  • example

    python mobilebert_quanteval.py --model_config mobilebert_w8a8_rte --per_device_eval_batch_size 4 --output_dir ./evaluation_result 
    
  • supported values of model_config are "mobilebert_w8a8_rte","mobilebert_w8a8_stsb","mobilebert_w8a8_mrpc","mobilebert_w8a8_cola","mobilebert_w8a8_sst2","mobilebert_w8a8_qnli","mobilebert_w8a8_qqp","mobilebert_w8a8_mnli", "mobilebert_w8a8_squad", "mobilebert_w4a8_rte","mobilebert_w4a8_stsb","mobilebert_w4a8_mrpc","mobilebert_w4a8_cola","mobilebert_w4a8_sst2","mobilebert_w4a8_qnli","mobilebert_w4a8_qqp","mobilebert_w4a8_mnli", "mobilebert_w4a8_squad"

Quantization Configuration

The following configuration has been used for the above models for IN4/INT8 quantization:

QAT Configuration CoLA SST-2 MRPC STS-B QQP MNLI QNLI RTE
W8A8 per-tensor, tf per-channel, tf per-channel, tf per-channel, tf per-channel, tf per-tensor, tf_enhanced per-tensor, tf_enhanced per-channel, tf
W4A8 per-tensor, tf_enhanced per-channel, tf_enhanced per-channel, tf_enhanced per-channel, tf_enhanced per-tensor, tf_enhanced per-tensor, tf_enhanced per-tensor, tf_enhanced per-channel, tf_enhanced
QAT Configuration
W8A8 per-channel, tf
W4A8 per-channel, range_learning_with_tf_enhanced_init

Results

Below are the results of the Pytorch transformer model MobileBert for GLUE dataset:

CoLA (corr) SST-2 (acc) MRPC (f1) STS-B (corr) QQP (acc) MNLI (acc) QNLI (acc) RTE (acc) GLUE
FP32 51.48 91.60 85.86 88.22 90.66 83.54 91.18 68.60 81.27
W8A8 52.51 91.63 90.81 88.19 90.80 83.46 91.12 68.95 82.18
W4A8 50.34 91.28 87.61 87.30 90.48 82.90 89.42 68.23 80.95
EM F1
FP32 82.75 90.11
W8A8 81.96 89.41
W4A8 81.88 89.33