memory preset results in UnboundLocalError
Opened this issue · 1 comments
aa956 commented
local-gemma version 0.2.0, pipx install on Linux, NVidia RTX 3090:
user:~$ local-gemma --model="27b" --preset="memory" "What is the capital of Germany?"
Traceback (most recent call last):
File "/home/user/.local/bin/local-gemma", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/.local/share/pipx/venvs/local-gemma/lib/python3.11/site-packages/local_gemma/cli.py", line 178, in main
if spare_memory / 1e9 > 5:
^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'spare_memory' where it is not associated with a value
user:~$ local-gemma --model="9b" --preset="memory" "What is the capital of Germany?"
Traceback (most recent call last):
File "/home/user/.local/bin/local-gemma", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/.local/share/pipx/venvs/local-gemma/lib/python3.11/site-packages/local_gemma/cli.py", line 178, in main
if spare_memory / 1e9 > 5:
^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'spare_memory' where it is not associated with a value
user:~$ local-gemma --model="2b" --preset="memory" "What is the capital of Germany?"
Loading model with the following characteristics:
- Model name: google/gemma-2-2b-it
- Assistant model name: None
- Device: cuda
- Default data type: torch.bfloat16
- Optimization preset: memory
- Generation arguments: {'do_sample': True, 'temperature': 0.7}
- Base prompt: None
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.88it/s]
The 'max_batch_size' argument of HybridCache is deprecated and will be removed in v4.46. Use the more precisely named 'batch_size' argument instead.
The capital of Germany is **Berlin**.
user:~$ nvidia-smi
Wed Nov 27 17:48:18 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A |
| 0% 39C P8 29W / 220W | 10MiB / 24576MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1380 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
brianN0KZ commented
This defect is hard to understand. From what I see in the code, no preset other than "auto" could ever have worked as that is the only case that defines the spare_memory variable.
Is this abandonware?