🐞 Bug: Error calculating VRAM: bad status: 401 Unauthorized

Question

🐞 Bug: Error calculating VRAM: bad status: 401 Unauthorized

Closed this issue 2 months ago · 10 comments

halbtuerke commented 4 months ago

Description

Tried using the vram estimator but getting a 401 Unauthorized error message.

How to reproduce

gollama --vram --model gemma2:2b-instruct-q8_0 --quant q8_0 --context 8096

Output

Error calculating VRAM: bad status: 401 Unauthorized

Environment

OS and version: macOS 14.6
Install source: go install github.com/sammcj/gollama@HEAD
Go version: go version go1.22.5 darwin/arm64

Can you contribute?

No

Answer 1 · 2024-08-03T00:15:39.000Z

Thanks! Working on a fix now

Answer 2 · 2024-08-03T02:27:56.000Z

New version should be available shortly that both supports huggingface/id and ollama:model

Let me know / re-open if it doesn't work for you!

go install github.com/sammcj/gollama@v1.26.0

Answer 3 · 2024-08-03T02:30:09.000Z

gl --vram llama3.1:8b-instruct-q6_K --fits 14
📊 VRAM Estimation for Model: llama3.1:8b-instruct-q6_K

| QUANT|CTX | BPW  | 2K  | 8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|-----|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7)    | 5.5(5.5,5.5)    | 7.3(7.3,7.3)    | 9.1(9.1,9.1)    |
| IQ2_XXS   | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3)    | 6.1(6.1,6.1)    | 7.9(7.9,7.9)    | 9.8(9.8,9.8)    |
| IQ2_XS    | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5)    | 6.4(6.4,6.4)    | 8.2(8.2,8.2)    | 10.1(10.1,10.1) |
| IQ2_S     | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7)    | 6.6(6.6,6.6)    | 8.5(8.5,8.5)    | 10.4(10.4,10.4) |
| IQ2_M     | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9)    | 6.8(6.8,6.8)    | 8.7(8.7,8.7)    | 10.6(10.6,10.6) |
| IQ3_XXS   | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3)    | 7.2(7.2,7.2)    | 9.2(9.2,9.2)    | 11.1(11.1,11.1) |
| IQ3_XS    | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5)    | 7.5(7.5,7.5)    | 9.5(9.5,9.5)    | 11.4(11.4,11.4) |
| Q2_K      | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6)    | 7.6(7.6,7.6)    | 9.5(9.5,9.5)    | 11.5(11.5,11.5) |
| Q3_K_S    | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_S     | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7)    | 7.7(7.7,7.7)    | 9.7(9.7,9.7)    | 11.7(11.7,11.7) |
| IQ3_M     | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0)    | 8.0(8.0,8.0)    | 9.9(9.9,9.9)    | 12.0(12.0,12.0) |
| Q3_K_M    | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2)    | 8.2(8.2,8.2)    | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS    | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L    | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5)    | 8.6(8.6,8.6)    | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL    | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0      | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S    | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9)    | 8.9(8.9,8.9)    | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M    | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1)    | 9.2(9.2,9.2)    | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L    | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2)    | 9.3(9.3,9.3)    | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_0      | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_S    | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8)    | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M    | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0)    | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L    | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1)    | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K      | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4)    | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0      | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |

and

gl --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 20
📊 VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B

| QUANT|CTX | BPW  | 2K  |  8K  |       16K       |       32K       |       49K       |       64K       |
|-----------|------|-----|------|-----------------|-----------------|-----------------|-----------------|
| IQ1_S     | 1.56 | 2.4 | 3.8  | 5.7(4.7,4.2)    | 9.5(7.5,6.5)    | 13.3(10.3,8.8)  | 17.1(13.1,11.1) |
| IQ2_XXS   | 2.06 | 2.9 | 4.3  | 6.3(5.3,4.8)    | 10.1(8.1,7.1)   | 13.9(10.9,9.4)  | 17.8(13.8,11.8) |
| IQ2_XS    | 2.31 | 3.1 | 4.6  | 6.5(5.5,5.0)    | 10.4(8.4,7.4)   | 14.2(11.2,9.8)  | 18.1(14.1,12.1) |
| IQ2_S     | 2.50 | 3.3 | 4.8  | 6.7(5.7,5.2)    | 10.6(8.6,7.6)   | 14.5(11.5,10.0) | 18.4(14.4,12.4) |
| IQ2_M     | 2.70 | 3.5 | 5.0  | 6.9(5.9,5.4)    | 10.8(8.8,7.8)   | 14.7(11.7,10.2) | 18.6(14.6,12.6) |
| IQ3_XXS   | 3.06 | 3.8 | 5.3  | 7.3(6.3,5.8)    | 11.2(9.2,8.2)   | 15.2(12.2,10.7) | 19.1(15.1,13.1) |
| IQ3_XS    | 3.30 | 4.1 | 5.5  | 7.5(6.5,6.0)    | 11.5(9.5,8.5)   | 15.5(12.5,11.0) | 19.4(15.4,13.4) |
| Q2_K      | 3.35 | 4.1 | 5.6  | 7.6(6.6,6.1)    | 11.6(9.6,8.6)   | 15.5(12.5,11.0) | 19.5(15.5,13.5) |
| IQ3_S     | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| Q3_K_S    | 3.50 | 4.3 | 5.8  | 7.7(6.7,6.2)    | 11.7(9.7,8.7)   | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| IQ3_M     | 3.70 | 4.5 | 6.0  | 8.0(7.0,6.5)    | 11.9(9.9,8.9)   | 15.9(12.9,11.4) | 20.0(16.0,14.0) |
| Q3_K_M    | 3.91 | 4.7 | 6.2  | 8.2(7.2,6.7)    | 12.2(10.2,9.2)  | 16.2(13.2,11.7) | 20.2(16.2,14.2) |
| IQ4_XS    | 4.25 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.6,12.1) | 20.7(16.7,14.7) |
| Q3_K_L    | 4.27 | 5.0 | 6.5  | 8.5(7.5,7.0)    | 12.6(10.6,9.6)  | 16.6(13.7,12.2) | 20.7(16.7,14.7) |
| IQ4_NL    | 4.50 | 5.2 | 6.7  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 16.9(13.9,12.4) | 21.0(17.0,15.0) |
| Q4_0      | 4.55 | 5.2 | 6.8  | 8.8(7.8,7.3)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_S    | 4.58 | 5.3 | 6.8  | 8.9(7.9,7.4)    | 12.9(10.9,9.9)  | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_M    | 4.85 | 5.5 | 7.1  | 9.1(8.1,7.6)    | 13.2(11.2,10.2) | 17.4(14.4,12.9) | 21.5(17.5,15.5) |
| Q4_K_L    | 4.90 | 5.6 | 7.1  | 9.2(8.2,7.7)    | 13.3(11.3,10.3) | 17.4(14.4,12.9) | 21.6(17.6,15.6) |
| Q5_K_S    | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_0      | 5.54 | 6.2 | 7.8  | 9.8(8.8,8.3)    | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_K_M    | 5.69 | 6.3 | 7.9  | 10.0(9.0,8.5)   | 14.2(12.2,11.2) | 18.4(15.4,13.9) | 22.6(18.6,16.6) |
| Q5_K_L    | 5.75 | 6.4 | 8.0  | 10.1(9.1,8.6)   | 14.3(12.3,11.3) | 18.5(15.5,14.0) | 22.7(18.7,16.7) |
| Q6_K      | 6.59 | 7.2 | 9.0  | 11.4(10.4,9.9)  | 16.2(14.2,13.2) | 21.0(18.0,16.5) | 25.8(21.8,19.8) |
| Q8_0      | 8.50 | 9.1 | 10.9 | 13.4(12.4,11.9) | 18.4(16.4,15.4) | 23.4(20.4,18.9) | 28.3(24.3,22.3) |

Answer 4 · 2024-08-03T11:04:11.000Z

Thank you!

Answer 5 · 2024-09-17T21:21:57.000Z

This still shows up on any models requiring you to approve their terms to view the repository on HF. This calculator has a solution where you can create a read-only auth token with only the "Read access to contents of all public gated repos you can access" permission enabled for it.

Would be nice if we could set said token in gollama's config file so those work here too.

Answer 6 · 2024-09-17T21:39:44.000Z

Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN - I'll look into this.

Answer 7 · 2024-09-17T21:39:47.000Z

Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN - I'll look into this.

Answer 8 · 2024-09-18T02:15:32.000Z

There's an environment variable? If all I gotta do is set that on my machine that would work, I never set any variable since I didn't even know that was a thing.

Maybe that should be mentioned in the readme then?

Answer 9 · 2024-09-18T02:36:51.000Z

Yeah the standard way to set your token for client applications that need HF access is by setting HUGGINGFACE_TOKEN in your environment. However - I think I still need to add it explicitly to the API requests in the code. I will try to get round to this in the next day or so.

Answer 10 · 2024-09-21T07:20:45.000Z

I couldn't reproduce it myself (with $HUGGINGFACE_TOKEN set), but I've improved the handling of this in the latest release.