Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 969 Bytes

distillation_workflow_pairwise-supervision.md

File metadata and controls

22 lines (17 loc) · 969 Bytes

Distillation Workflow: Pairwise Supervision

For pairwise supervision you need one or more trained BERT_CAT models to generate the teacher scores with per model with matchmaker/distillation/teacher-train-scorer.py, then you can use it directly in train_tsv (if you use only 1 model) or ensemble the scores with matchmaker/distillation/teacher_scores_ensemble.py for multiple models used.

Then set the following config to activate the knowledge distillation loss:

train_pairwise_distillation: True
loss: "margin-mse"

Please cite our work as:

@article{hofstaetter2020_crossarchitecture_kd,
      title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation}, 
      author={Sebastian Hofst{\"a}tter and Sophia Althammer and Michael Schr{\"o}der and Mete Sertkan and Allan Hanbury},
      year={2020},
      journal={arXiv:2010.02666},
}