kvcache-ai/ktransformers

Can I run llama3.1 70b with rtx4090+64g ddr5 ram?

codeMonkey-shin opened this issue · 1 comments

Can I run llama3.1 70b with rtx4090+64g ddr5 ram?

At what rate per second are tokens generated?

This is not an issue, it is general question.
As far as I know, ktransformers speeds up only MoE models, and llama3.1 70b is dense model (no routing/expert layers).
The answer to your question depends on your model quantization level, motherboard, RAM used and CPU.