Evaluation run for all "good open weight models" with all available quantizations and different GPUs

Question

zimmski opened this issue 3 months ago · 0 comments

Not sure on how we should do that yet. CPU-only-inference will break us here, and speed-metrics are important as well.