Can I run llama3.1 70b with rtx4090+64g ddr5 ram?

Question

Can I run llama3.1 70b with rtx4090+64g ddr5 ram?

codeMonkey-shin opened this issue 4 months ago · 1 comments

At what rate per second are tokens generated?

Answer 1 · 2024-08-21T08:00:48.000Z

This is not an issue, it is general question.
As far as I know, ktransformers speeds up only MoE models, and llama3.1 70b is dense model (no routing/expert layers).
The answer to your question depends on your model quantization level, motherboard, RAM used and CPU.