sammcj/gollama

๐Ÿž Bug: Error calculating VRAM: bad status: 401 Unauthorized

Closed this issue ยท 10 comments

Description

Tried using the vram estimator but getting a 401 Unauthorized error message.

How to reproduce

gollama --vram --model gemma2:2b-instruct-q8_0 --quant q8_0 --context 8096

Output

Error calculating VRAM: bad status: 401 Unauthorized

Environment

  • OS and version: macOS 14.6
  • Install source: go install github.com/sammcj/gollama@HEAD
  • Go version: go version go1.22.5 darwin/arm64

Can you contribute?

No

Thanks! Working on a fix now

New version should be available shortly that both supports huggingface/id and ollama:model

Let me know / re-open if it doesn't work for you!

go install github.com/sammcj/gollama@v1.26.0
gl --vram llama3.1:8b-instruct-q6_K --fits 14
๐Ÿ“Š VRAM Estimation for Model: llama3.1:8b-instruct-q6_K

| QUANT|CTX | BPW  | 2K  | 8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|-----|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7)    | 5.5(5.5,5.5)    | 7.3(7.3,7.3)    | 9.1(9.1,9.1)    |
| IQ2_XXS   | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3)    | 6.1(6.1,6.1)    | 7.9(7.9,7.9)    | 9.8(9.8,9.8)    |
| IQ2_XS    | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5)    | 6.4(6.4,6.4)    | 8.2(8.2,8.2)    | 10.1(10.1,10.1) |
| IQ2_S     | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7)    | 6.6(6.6,6.6)    | 8.5(8.5,8.5)    | 10.4(10.4,10.4) |
| IQ2_M     | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9)    | 6.8(6.8,6.8)    | 8.7(8.7,8.7)    | 10.6(10.6,10.6) |
| IQ3_XXS   | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3)    | 7.2(7.2,7.2)    | 9.2(9.2,9.2)    | 11.1(11.1,11.1) |
| IQ3_XS    | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5)    | 7.5(7.5,7.5)    | 9.5(9.5,9.5)    | 11.4(11.4,11.4) |
| Q2_K      | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6)    | 7.6(7.6,7.6)    | 9.5(9.5,9.5)    | 11.5(11.5,11.5) |
| Q3_K_S    | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_S     | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_M     | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0)    | 8.0(8.0,8.0)    | 9.9(9.9,9.9)    | 12.0(12.0,12.0) |
| Q3_K_M    | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2)    | 8.2(8.2,8.2)    | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS    | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L    | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL    | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0      | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S    | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M    | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1)    | 9.2(9.2,9.2)    | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L    | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2)    | 9.3(9.3,9.3)    | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_0      | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_S    | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M    | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0)    | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L    | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1)    | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K      | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4)    | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0      | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |

and

gl --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 20
๐Ÿ“Š VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B

| QUANT|CTX | BPW  | 2K  |  8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|------|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.4 | 3.8  | 5.7(4.7,4.2)    | 9.5(7.5,6.5)    | 13.3(10.3,8.8)  | 17.1(13.1,11.1) |
| IQ2_XXS   | 2.06 | 2.9 | 4.3  | 6.3(5.3,4.8)    | 10.1(8.1,7.1)   | 13.9(10.9,9.4)  | 17.8(13.8,11.8) |
| IQ2_XS    | 2.31 | 3.1 | 4.6  | 6.5(5.5,5.0)    | 10.4(8.4,7.4)   | 14.2(11.2,9.8)  | 18.1(14.1,12.1) |
| IQ2_S     | 2.50 | 3.3 | 4.8  | 6.7(5.7,5.2)    | 10.6(8.6,7.6)   | 14.5(11.5,10.0) | 18.4(14.4,12.4) |
| IQ2_M     | 2.70 | 3.5 | 5.0  | 6.9(5.9,5.4)    | 10.8(8.8,7.8)   | 14.7(11.7,10.2) | 18.6(14.6,12.6) |
| IQ3_XXS   | 3.06 | 3.8 | 5.3  | 7.3(6.3,5.8)    | 11.2(9.2,8.2)   | 15.2(12.2,10.7) | 19.1(15.1,13.1) |
| IQ3_XS    | 3.30 | 4.1 | 5.5  | 7.5(6.5,6.0)    | 11.5(9.5,8.5)   | 15.5(12.5,11.0) | 19.4(15.4,13.4) |
| Q2_K      | 3.35 | 4.1 | 5.6  | 7.6(6.6,6.1)    | 11.6(9.6,8.6)   | 15.5(12.5,11.0) | 19.5(15.5,13.5) |
| IQ3_S     | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| Q3_K_S    | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| IQ3_M     | 3.70 | 4.5 | 6.0  | 8.0(7.0,6.5)    | 11.9(9.9,8.9)   | 15.9(12.9,11.4) | 20.0(16.0,14.0) |
| Q3_K_M    | 3.91 | 4.7 | 6.2  | 8.2(7.2,6.7)    | 12.2(10.2,9.2)  | 16.2(13.2,11.7) | 20.2(16.2,14.2) |
| IQ4_XS    | 4.25 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.6,12.1) | 20.7(16.7,14.7) |
| Q3_K_L    | 4.27 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.7,12.2) | 20.7(16.7,14.7) |
| IQ4_NL    | 4.50 | 5.2 | 6.7  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 16.9(13.9,12.4) | 21.0(17.0,15.0) |
| Q4_0      | 4.55 | 5.2 | 6.8  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_S    | 4.58 | 5.3 | 6.8  | 8.9(7.9,7.4)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_M    | 4.85 | 5.5 | 7.1  | 9.1(8.1,7.6)    | 13.2(11.2,10.2) | 17.4(14.4,12.9) | 21.5(17.5,15.5) |
| Q4_K_L    | 4.90 | 5.6 | 7.1  | 9.2(8.2,7.7)    | 13.3(11.3,10.3) | 17.4(14.4,12.9) | 21.6(17.6,15.6) |
| Q5_K_S    | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_0      | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_K_M    | 5.69 | 6.3 | 7.9  | 10.0(9.0,8.5)   | 14.2(12.2,11.2) | 18.4(15.4,13.9) | 22.6(18.6,16.6) |
| Q5_K_L    | 5.75 | 6.4 | 8.0  | 10.1(9.1,8.6)   | 14.3(12.3,11.3) | 18.5(15.5,14.0) | 22.7(18.7,16.7) |
| Q6_K      | 6.59 | 7.2 | 9.0  | 11.4(10.4,9.9)  | 16.2(14.2,13.2) | 21.0(18.0,16.5) | 25.8(21.8,19.8) |
| Q8_0      | 8.50 | 9.1 | 10.9 | 13.4(12.4,11.9) | 18.4(16.4,15.4) | 23.4(20.4,18.9) | 28.3(24.3,22.3) |

Thank you!

This still shows up on any models requiring you to approve their terms to view the repository on HF. This calculator has a solution where you can create a read-only auth token with only the "Read access to contents of all public gated repos you can access" permission enabled for it.

Would be nice if we could set said token in gollama's config file so those work here too.

Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN - I'll look into this.

Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN - I'll look into this.

There's an environment variable? If all I gotta do is set that on my machine that would work, I never set any variable since I didn't even know that was a thing.

Maybe that should be mentioned in the readme then?

Yeah the standard way to set your token for client applications that need HF access is by setting HUGGINGFACE_TOKEN in your environment. However - I think I still need to add it explicitly to the API requests in the code. I will try to get round to this in the next day or so.

I couldn't reproduce it myself (with $HUGGINGFACE_TOKEN set), but I've improved the handling of this in the latest release.