Covers a couple of approaches to training an early ranker with knowledge distillation from final ranker.
Files:
- baseline_early_ranker.py : Shows how normally early ranker is trained.
- kd_aux_early_ranker.py : Shows how we can train using auxiliary tasks corresponding to teacher labels.
- kd_shared_early_ranker.py : Uses the shared logits approach to KD.