JBGruber/rollama

`model_params` do not seem to work

JBGruber opened this issue · 0 comments

Turning off the GPU should signigicantly slow down the answering. But it doesn't:

library(rollama)
res <- bench::mark(
  cpu = {query("why is the sky blue?",
               model_params = list(num_gpu = 0))},
  gpu = query("why is the sky blue?"), 
  check = FALSE
)

summary(res)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 cpu            4.9s     4.9s     0.204    4.98MB    0    
#> 2 gpu           6.09s    6.09s     0.164  545.33KB    0.164

Created on 2024-01-23 with reprex v2.0.2

Using

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "options": {
    "seed": 42,
    "num_gpu": 0
  }
}'

The time ollama takes goes up to 50s, which makes more sense. I assume parameters are not translated to JSON correctly.