Loading model into RAM at prepare step is redundant

Question

Loading model into RAM at prepare step is redundant

Closed this issue 7 months ago · 0 comments

User had insufficient RAM for the prepare step at xllm-demo project, it arises because, during this step, the model is downloaded and loaded into RAM. This approach is suboptimal, redundant, and may lead to similar instances that you've experienced. Simply downloading the model will suffice.

Link: BobaZooba/xllm-demo#1