๐ Bug: Error calculating VRAM: bad status: 401 Unauthorized
Closed this issue ยท 10 comments
Description
Tried using the vram estimator but getting a 401 Unauthorized
error message.
How to reproduce
gollama --vram --model gemma2:2b-instruct-q8_0 --quant q8_0 --context 8096
Output
Error calculating VRAM: bad status: 401 Unauthorized
Environment
- OS and version: macOS 14.6
- Install source:
go install github.com/sammcj/gollama@HEAD
- Go version:
go version go1.22.5 darwin/arm64
Can you contribute?
No
Thanks! Working on a fix now
New version should be available shortly that both supports huggingface/id and ollama:model
Let me know / re-open if it doesn't work for you!
go install github.com/sammcj/gollama@v1.26.0
gl --vram llama3.1:8b-instruct-q6_K --fits 14
๐ VRAM Estimation for Model: llama3.1:8b-instruct-q6_K
| QUANT|CTX | BPW | 2K | 8K | 16K | 32K | 49K | 64K |
|-----------|------|-----|-----|-----------------|-----------------|-----------------|-----------------|
| IQ1_S | 1.56 | 2.2 | 2.8 | 3.7(3.7,3.7) | 5.5(5.5,5.5) | 7.3(7.3,7.3) | 9.1(9.1,9.1) |
| IQ2_XXS | 2.06 | 2.6 | 3.3 | 4.3(4.3,4.3) | 6.1(6.1,6.1) | 7.9(7.9,7.9) | 9.8(9.8,9.8) |
| IQ2_XS | 2.31 | 2.9 | 3.6 | 4.5(4.5,4.5) | 6.4(6.4,6.4) | 8.2(8.2,8.2) | 10.1(10.1,10.1) |
| IQ2_S | 2.50 | 3.1 | 3.8 | 4.7(4.7,4.7) | 6.6(6.6,6.6) | 8.5(8.5,8.5) | 10.4(10.4,10.4) |
| IQ2_M | 2.70 | 3.2 | 4.0 | 4.9(4.9,4.9) | 6.8(6.8,6.8) | 8.7(8.7,8.7) | 10.6(10.6,10.6) |
| IQ3_XXS | 3.06 | 3.6 | 4.3 | 5.3(5.3,5.3) | 7.2(7.2,7.2) | 9.2(9.2,9.2) | 11.1(11.1,11.1) |
| IQ3_XS | 3.30 | 3.8 | 4.5 | 5.5(5.5,5.5) | 7.5(7.5,7.5) | 9.5(9.5,9.5) | 11.4(11.4,11.4) |
| Q2_K | 3.35 | 3.9 | 4.6 | 5.6(5.6,5.6) | 7.6(7.6,7.6) | 9.5(9.5,9.5) | 11.5(11.5,11.5) |
| Q3_K_S | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7) | 7.7(7.7,7.7) | 9.7(9.7,9.7) | 11.7(11.7,11.7) |
| IQ3_S | 3.50 | 4.0 | 4.8 | 5.7(5.7,5.7) | 7.7(7.7,7.7) | 9.7(9.7,9.7) | 11.7(11.7,11.7) |
| IQ3_M | 3.70 | 4.2 | 5.0 | 6.0(6.0,6.0) | 8.0(8.0,8.0) | 9.9(9.9,9.9) | 12.0(12.0,12.0) |
| Q3_K_M | 3.91 | 4.4 | 5.2 | 6.2(6.2,6.2) | 8.2(8.2,8.2) | 10.2(10.2,10.2) | 12.2(12.2,12.2) |
| IQ4_XS | 4.25 | 4.7 | 5.5 | 6.5(6.5,6.5) | 8.6(8.6,8.6) | 10.6(10.6,10.6) | 12.7(12.7,12.7) |
| Q3_K_L | 4.27 | 4.7 | 5.5 | 6.5(6.5,6.5) | 8.6(8.6,8.6) | 10.7(10.7,10.7) | 12.7(12.7,12.7) |
| IQ4_NL | 4.50 | 5.0 | 5.7 | 6.8(6.8,6.8) | 8.9(8.9,8.9) | 10.9(10.9,10.9) | 13.0(13.0,13.0) |
| Q4_0 | 4.55 | 5.0 | 5.8 | 6.8(6.8,6.8) | 8.9(8.9,8.9) | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_S | 4.58 | 5.0 | 5.8 | 6.9(6.9,6.9) | 8.9(8.9,8.9) | 11.0(11.0,11.0) | 13.1(13.1,13.1) |
| Q4_K_M | 4.85 | 5.3 | 6.1 | 7.1(7.1,7.1) | 9.2(9.2,9.2) | 11.4(11.4,11.4) | 13.5(13.5,13.5) |
| Q4_K_L | 4.90 | 5.3 | 6.1 | 7.2(7.2,7.2) | 9.3(9.3,9.3) | 11.4(11.4,11.4) | 13.6(13.6,13.6) |
| Q5_0 | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8) | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_S | 5.54 | 5.9 | 6.8 | 7.8(7.8,7.8) | 10.0(10.0,10.0) | 12.2(12.2,12.2) | 14.4(14.4,14.4) |
| Q5_K_M | 5.69 | 6.1 | 6.9 | 8.0(8.0,8.0) | 10.2(10.2,10.2) | 12.4(12.4,12.4) | 14.6(14.6,14.6) |
| Q5_K_L | 5.75 | 6.1 | 7.0 | 8.1(8.1,8.1) | 10.3(10.3,10.3) | 12.5(12.5,12.5) | 14.7(14.7,14.7) |
| Q6_K | 6.59 | 7.0 | 8.0 | 9.4(9.4,9.4) | 12.2(12.2,12.2) | 15.0(15.0,15.0) | 17.8(17.8,17.8) |
| Q8_0 | 8.50 | 8.8 | 9.9 | 11.4(11.4,11.4) | 14.4(14.4,14.4) | 17.4(17.4,17.4) | 20.3(20.3,20.3) |
and
gl --vram NousResearch/Hermes-2-Theta-Llama-3-8B --fits 20
๐ VRAM Estimation for Model: NousResearch/Hermes-2-Theta-Llama-3-8B
| QUANT|CTX | BPW | 2K | 8K | 16K | 32K | 49K | 64K |
|-----------|------|-----|------|-----------------|-----------------|-----------------|-----------------|
| IQ1_S | 1.56 | 2.4 | 3.8 | 5.7(4.7,4.2) | 9.5(7.5,6.5) | 13.3(10.3,8.8) | 17.1(13.1,11.1) |
| IQ2_XXS | 2.06 | 2.9 | 4.3 | 6.3(5.3,4.8) | 10.1(8.1,7.1) | 13.9(10.9,9.4) | 17.8(13.8,11.8) |
| IQ2_XS | 2.31 | 3.1 | 4.6 | 6.5(5.5,5.0) | 10.4(8.4,7.4) | 14.2(11.2,9.8) | 18.1(14.1,12.1) |
| IQ2_S | 2.50 | 3.3 | 4.8 | 6.7(5.7,5.2) | 10.6(8.6,7.6) | 14.5(11.5,10.0) | 18.4(14.4,12.4) |
| IQ2_M | 2.70 | 3.5 | 5.0 | 6.9(5.9,5.4) | 10.8(8.8,7.8) | 14.7(11.7,10.2) | 18.6(14.6,12.6) |
| IQ3_XXS | 3.06 | 3.8 | 5.3 | 7.3(6.3,5.8) | 11.2(9.2,8.2) | 15.2(12.2,10.7) | 19.1(15.1,13.1) |
| IQ3_XS | 3.30 | 4.1 | 5.5 | 7.5(6.5,6.0) | 11.5(9.5,8.5) | 15.5(12.5,11.0) | 19.4(15.4,13.4) |
| Q2_K | 3.35 | 4.1 | 5.6 | 7.6(6.6,6.1) | 11.6(9.6,8.6) | 15.5(12.5,11.0) | 19.5(15.5,13.5) |
| IQ3_S | 3.50 | 4.3 | 5.8 | 7.7(6.7,6.2) | 11.7(9.7,8.7) | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| Q3_K_S | 3.50 | 4.3 | 5.8 | 7.7(6.7,6.2) | 11.7(9.7,8.7) | 15.7(12.7,11.2) | 19.7(15.7,13.7) |
| IQ3_M | 3.70 | 4.5 | 6.0 | 8.0(7.0,6.5) | 11.9(9.9,8.9) | 15.9(12.9,11.4) | 20.0(16.0,14.0) |
| Q3_K_M | 3.91 | 4.7 | 6.2 | 8.2(7.2,6.7) | 12.2(10.2,9.2) | 16.2(13.2,11.7) | 20.2(16.2,14.2) |
| IQ4_XS | 4.25 | 5.0 | 6.5 | 8.5(7.5,7.0) | 12.6(10.6,9.6) | 16.6(13.6,12.1) | 20.7(16.7,14.7) |
| Q3_K_L | 4.27 | 5.0 | 6.5 | 8.5(7.5,7.0) | 12.6(10.6,9.6) | 16.6(13.7,12.2) | 20.7(16.7,14.7) |
| IQ4_NL | 4.50 | 5.2 | 6.7 | 8.8(7.8,7.3) | 12.9(10.9,9.9) | 16.9(13.9,12.4) | 21.0(17.0,15.0) |
| Q4_0 | 4.55 | 5.2 | 6.8 | 8.8(7.8,7.3) | 12.9(10.9,9.9) | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_S | 4.58 | 5.3 | 6.8 | 8.9(7.9,7.4) | 12.9(10.9,9.9) | 17.0(14.0,12.5) | 21.1(17.1,15.1) |
| Q4_K_M | 4.85 | 5.5 | 7.1 | 9.1(8.1,7.6) | 13.2(11.2,10.2) | 17.4(14.4,12.9) | 21.5(17.5,15.5) |
| Q4_K_L | 4.90 | 5.6 | 7.1 | 9.2(8.2,7.7) | 13.3(11.3,10.3) | 17.4(14.4,12.9) | 21.6(17.6,15.6) |
| Q5_K_S | 5.54 | 6.2 | 7.8 | 9.8(8.8,8.3) | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_0 | 5.54 | 6.2 | 7.8 | 9.8(8.8,8.3) | 14.0(12.0,11.0) | 18.2(15.2,13.7) | 22.4(18.4,16.4) |
| Q5_K_M | 5.69 | 6.3 | 7.9 | 10.0(9.0,8.5) | 14.2(12.2,11.2) | 18.4(15.4,13.9) | 22.6(18.6,16.6) |
| Q5_K_L | 5.75 | 6.4 | 8.0 | 10.1(9.1,8.6) | 14.3(12.3,11.3) | 18.5(15.5,14.0) | 22.7(18.7,16.7) |
| Q6_K | 6.59 | 7.2 | 9.0 | 11.4(10.4,9.9) | 16.2(14.2,13.2) | 21.0(18.0,16.5) | 25.8(21.8,19.8) |
| Q8_0 | 8.50 | 9.1 | 10.9 | 13.4(12.4,11.9) | 18.4(16.4,15.4) | 23.4(20.4,18.9) | 28.3(24.3,22.3) |
Thank you!
This still shows up on any models requiring you to approve their terms to view the repository on HF. This calculator has a solution where you can create a read-only auth token with only the "Read access to contents of all public gated repos you can access" permission enabled for it.
Would be nice if we could set said token in gollama's config file so those work here too.
Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN
- I'll look into this.
Thanks @bgiesing, I actually thought I was respecting $HUGGINGFACE_TOKEN
- I'll look into this.
There's an environment variable? If all I gotta do is set that on my machine that would work, I never set any variable since I didn't even know that was a thing.
Maybe that should be mentioned in the readme then?
Yeah the standard way to set your token for client applications that need HF access is by setting HUGGINGFACE_TOKEN
in your environment. However - I think I still need to add it explicitly to the API requests in the code. I will try to get round to this in the next day or so.
I couldn't reproduce it myself (with $HUGGINGFACE_TOKEN set), but I've improved the handling of this in the latest release.