songmzhang/DSKD

Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".

Python

Issues

If use different model structure to full finetune, I find span repeat
#23 opened 3 days ago by testpppppp
3
Is vanilla KD for same vocab equivalent to Minimum Edit Distance for different vocab?
#22 opened 23 days ago by survivebycoding
3
Is the Tinyllama in the description a base model or pretrained model?
#16 opened 22 days ago by survivebycoding
1
The code works only with dev and train set, and not with test set. Right?
#21 opened 22 days ago by survivebycoding
1
load 72B teacher model
#13 opened 3 months ago by ypw-lbj
6
Files for token mapping
#20 opened 2 months ago by ntsw2001
2
Quantify difference in vocabulary
#19 opened 2 months ago by srikhetramohanty
5
Failed to reproduce KD results
#18 opened 2 months ago by cpsu00
4
Reproduction of results
#15 opened 3 months ago by mathamateur
9
GPT2-1.5B Pretrained Teacher on Dolly
#17 opened 3 months ago by cpsu00
2
Evaluation script error with TinyLlama
#12 opened 3 months ago by srikhetramohanty
2
using mistral from
#14 opened 3 months ago by survivebycoding
6
Concern regarding performance
#10 opened 3 months ago by survivebycoding
15
Running inference using evaluation scripts
#9 opened 4 months ago by srikhetramohanty
2
qwen
#11 opened 3 months ago by zjjznw123
3
Getting an error when trying to perform SFT on Tiny Llama
#8 opened 4 months ago by survivebycoding
10
More desctipion on output folder created
#7 opened 4 months ago by survivebycoding
1
Can we use this code for CPU?
#6 opened 4 months ago by survivebycoding
1
need LLama .bin file instead of .pth file
#5 opened 4 months ago by survivebycoding
2
From where should we downloads the models?
#4 opened 4 months ago by survivebycoding
4
Usage with other model combinations
#3 opened 4 months ago by botox-100
4
About SeqKD with different vocabularies
#2 opened 5 months ago by 2018cx
3
关于 AKL 的计算
#1 opened 5 months ago by wutaiqiang
1