Few questions on fine tuning
Opened this issue · 0 comments
sidharthiimc commented
- What is the max context window that a 7B model can take in? I am looking for a business problem with a min of 4K to max of 32K tokens as input.
- For above task will it better to fine tune your 4bit GPTQ quantized model or Base model from scratch?
- Is single A100 GPU will be enough for the above task?
- How long will it take for say 10K sample and 10 epoch?
- I want to do predict in a batch. I am seeing since evaluation is happening in batch. Can add or point to the code which can help me use fine tuned model to predict in for a batch say size 8 or 10.