huggingface/local-gemma

memory preset results in UnboundLocalError

Opened this issue · 1 comments

aa956 commented

local-gemma version 0.2.0, pipx install on Linux, NVidia RTX 3090:

user:~$ local-gemma --model="27b" --preset="memory" "What is the capital of Germany?"
Traceback (most recent call last):
  File "/home/user/.local/bin/local-gemma", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/.local/share/pipx/venvs/local-gemma/lib/python3.11/site-packages/local_gemma/cli.py", line 178, in main
    if spare_memory / 1e9 > 5:
       ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'spare_memory' where it is not associated with a value
user:~$ local-gemma --model="9b" --preset="memory" "What is the capital of Germany?"
Traceback (most recent call last):
  File "/home/user/.local/bin/local-gemma", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/.local/share/pipx/venvs/local-gemma/lib/python3.11/site-packages/local_gemma/cli.py", line 178, in main
    if spare_memory / 1e9 > 5:
       ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'spare_memory' where it is not associated with a value
user:~$ local-gemma --model="2b" --preset="memory" "What is the capital of Germany?"

Loading model with the following characteristics:
- Model name: google/gemma-2-2b-it
- Assistant model name: None
- Device: cuda
- Default data type: torch.bfloat16
- Optimization preset: memory
- Generation arguments: {'do_sample': True, 'temperature': 0.7}
- Base prompt: None

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.88it/s]
The 'max_batch_size' argument of HybridCache is deprecated and will be removed in v4.46. Use the more precisely named 'batch_size' argument instead.
The capital of Germany is **Berlin**. 

user:~$ nvidia-smi
Wed Nov 27 17:48:18 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
|  0%   39C    P8              29W / 220W |     10MiB / 24576MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1380      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

This defect is hard to understand. From what I see in the code, no preset other than "auto" could ever have worked as that is the only case that defines the spare_memory variable.

Is this abandonware?