princeton-nlp/LLM-Shearing

How much compute will this take?

Closed this issue · 7 comments

Hi,
If I want to make a 1B/3B model for Mistral, do you know approximately how many dollars I'll have to spend in compute, and whether I can do it on a consumer GPU? Thanks!

We use approximantely 1845 A100 GPU hours and 3310 A100 GPU hours to get the 1.3B and 2.7B model. However, the actual execution also heavily dependent on your set up and cluster speed.

Also, are you planning to release a sheared Mistral version?

We use an in-house cluster at Princeton! I think A100 should be more expensive than $0.5 per hour though.

Also, are you planning to release a sheared Mistral version?

We intend to add support for the mistral and pythia models in the upcoming weeks. We are in short of computes -- so I am not sure if we will end up delivering these models before the next stronger 7B model comes out.

Hi, when having full control over all finetuning data, does it make the most sense to first shear the base model and then finetune on top? Or is it better to finetune in advance (or a mixture of both)? Completely disregarding cost, just purely performance and overfitting related

Hi! Yeah I think it makes most sense to prune the base model first then finetune, as it's largely believed that the abilities of language models are enabled by pre-training. This is the most neat way to execute.

However, I am not too sure about what the performance will be like when mixing pre-training and fine-tuning data for pruning -- it might have the benefit to help the pruning process find a submodel that better follows instructions.

Alright, tysm!