princeton-nlp/LLM-Shearing

ShearedCodeLLama

Closed this issue · 3 comments

Hi! I am working on a copilot backend and, even though I am using a GPTQ quant of codellama7b, it is still eating lots of VRAM
DeepSeek coder seems to have severe issues understanding fill in the middle

I wanted to ask if you plan on also shearing CodeLlama? :)

I also want to shearing codellama, do you success it?

@YanxiZSQ
It would cost around 2k given this estimate #22

I just fixed my deepseek prompt and I have to say they are very great models, closing!