asahi417/lm-vocab-trimmer

Support for LLaMA or Baichuan models

Opened this issue · 4 comments

Hi, I was wondering if this method can be used for trimming large vocabulary in LLMs. Can vocab trimmer be extended to LLMs?

It should be possible to apply vocabulary trimming (VT) to mono-lingual LMs, but it would have little affect on the model size. Mainly because for monolingual LLMs, the ratio of embedding matrix to the whole parameters is very little, and in fact their vocabulary is already quite small (eg. LLaMA has 32k tokens).

The reason why VT is effective for small multilingual LMs (XLM, mT5, mBART), is because the ratio of embedding matrix to the whole model parameter is very dominant (eg. the ratio is 80% for mT5 small), and their vocabulary is extremely large (eg. mT5 has 250k tokens).

Although, it sounds an interesting direction to apply VT to monolingual LMs! I might work on it if I would be more convinced and had some spare time on it!

I agree it has a little effect on LLaMA. But for bilingual LLMs such as Baichuan-2 has 120k tokens. Many multi-lingual LLMs are emerging and this technique would be very useful to make them monolingual and users can finetune in their target languages without worrying about other languages.

I am currently trying to to extend the library to Baichuan-2. Facing many issues and debugging them... I will release it here after I finish that. May be we can extend this to any LLM on hugging face using Auto Casual LM inference.

Hey @Tyler-Durden-official were you able to figure this out? I am interested in using this to reduce newer models like phi-3/LLaMA3.

Support for MistralForCausalLM would also be great!