Can I run llama3.1 70b with rtx4090+64g ddr5 ram?
codeMonkey-shin opened this issue · 1 comments
codeMonkey-shin commented
Can I run llama3.1 70b with rtx4090+64g ddr5 ram?
At what rate per second are tokens generated?
ELigoP commented
This is not an issue, it is general question.
As far as I know, ktransformers speeds up only MoE models, and llama3.1 70b is dense model (no routing/expert layers).
The answer to your question depends on your model quantization level, motherboard, RAM used and CPU.