microsoft/vscode-ai-toolkit

PHI-2 runs out of memory on RTX 3070 Ti with 8Gb memory when running finetuning step

eric-vanartsdalen opened this issue · 3 comments

TLDR;
PHI-2 will runs out of memory on an 8Gb Nvidia 3070 card on fine tuning step from Readme.

NOTE: Please advise if there are any ways to reduce batch size or ways to minimize memory footprint on the GPU.

ENVIRONMENT:
Windows
OS Name: Microsoft Windows 11 Home
OS Version: 10.0.22631 N/A Build 22631
OS Manufacturer: Microsoft Corporation
OS Configuration: Standalone Workstation
OS Build Type: Multiprocessor Free
System Manufacturer: Acer
System Model: Nitro AN515-46
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: AMD64 Family 25 Model 68 Stepping 1 AuthenticAMD ~3201 Mhz
Total Physical Memory: 15,557 MB
Available Physical Memory: 8,252 MB
Virtual Memory: Max Size: 30,249 MB
Virtual Memory: Available: 10,821 MB
Virtual Memory: In Use: 19,428 MB
Page File Location(s): C:\pagefile.sys

nvidia-smi
Sun Dec 24 23:11:50 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.92 Driver Version: 545.92 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3070 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 43C P8 15W / 140W | 706MiB / 8192MiB | 7% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

SETUP:
Follow #28 for setup I followed...
Ran first_time_setup.sh - which creates the phi-2-env

STEPS:

  • start Terminal at root of project
  • conda deactivate
  • conda activate phi-2-env
  • python finetuning/invoke_olive.py

EXPECTED: Since PHI-2 model safetensor files total seems to be around 5G, would have expected we could have done a finetune with 8G.

ACTUAL: Fine tuning model runs out of memory.

(base) ericv@NITRO:/mnt/c/mnt/c/Users/ericv/source/repos/AI/SamplePHI2$ conda deactivate
ericv@NITRO:/mnt/c/mnt/c/Users/ericv/source/repos/AI/SamplePHI2$ conda activate phi-2-env
(phi-2-env) ericv@NITRO:/mnt/c/mnt/c/Users/ericv/source/repos/AI/SamplePHI2$ python finetuning/invoke_olive.py
[2023-12-24 23:02:47,450] [DEBUG] [engine.py:125:setup_accelerators] Initial execution providers: ['CPUExecutionProvider']
[2023-12-24 23:02:47,450] [DEBUG] [engine.py:143:setup_accelerators] Initial accelerators: ['gpu']
[2023-12-24 23:02:47,450] [DEBUG] [engine.py:164:setup_accelerators] Supported execution providers for device gpu: ['CUDAExecutionProvider', 'TensorrtExecutionProvider', 'CPUExecutionProvider']
[2023-12-24 23:02:47,450] [INFO] [engine.py:181:setup_accelerators] Running workflow on accelerator specs: gpu-cpu
[2023-12-24 23:02:47,470] [DEBUG] [engine.py:482:run_no_search] Running ['qlora'] with no search ...
[2023-12-24 23:02:47,470] [INFO] [engine.py:924:_run_pass] Running pass qlora:QLoRA
[2023-12-24 23:02:48,925] [INFO] [hf_utils.py:218:load_model] Loading Huggingface model from model-cache/microsoft/phi-2
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:45<00:00, 22.93s/it]
[2023-12-24 23:03:35,592] [DEBUG] [hf_utils.py:254:load_huggingface_model_from_task] Loaded model <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> with name_or_path model-cache/microsoft/phi-2
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2023-12-24 23:03:35,804] [INFO] [qlora.py:420:smart_tokenizer_and_embedding_resize] Added 1 new tokens to tokenizer and resized model embedding layer.
[2023-12-24 23:03:35,950] [DEBUG] [qlora.py:388:get_model_tokenizer] Adding LoRA modules
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2023-12-24 23:03:47,926] [INFO] [qlora.py:258:_run_for_config] Running QLoRA fine-tuning
0%| | 0/1200 [00:00<?, ?it/s][2023-12-24 23:04:05,309] [ERROR] [engine.py:997:_run_pass] Pass run failed.
Traceback (most recent call last):
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/engine/engine.py", line 985, in _run_pass
output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/systems/local.py", line 32, in run_pass
output_model = the_pass.run(model, data_root, output_model_path, point)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/passes/olive_pass.py", line 367, in run
output_model = self._run_for_config(model, data_root, config, output_model_path)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/passes/pytorch/qlora.py", line 259, in _run_for_config
train_result = trainer.train()
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 2776, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 2801, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/peft/peft_model.py", line 918, in forward
return self.base_model(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 953, in forward
hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 915, in forward
hidden_states = layer(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 770, in forward
attn_outputs = self.mixer(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 722, in forward
attn_output = self._forward_self_attn(x, attention_mask)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 621, in _forward_self_attn
return self.inner_attn(qkv, key_padding_mask=key_padding_mask)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 377, in forward
scores = scores + causal_mask.to(dtype=scores.dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 14.17 GiB is allocated by PyTorch, and 98.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[2023-12-24 23:04:05,444] [WARNING] [engine.py:432:run_accelerator] Failed to run Olive on gpu-cpu: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 14.17 GiB is allocated by PyTorch, and 98.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/engine/engine.py", line 412, in run_accelerator
return self.run_no_search(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/engine/engine.py", line 483, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/engine/engine.py", line 887, in _run_passes
model_config, model_id = self._run_pass(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/engine/engine.py", line 985, in _run_pass
output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/systems/local.py", line 32, in run_pass
output_model = the_pass.run(model, data_root, output_model_path, point)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/passes/olive_pass.py", line 367, in run
output_model = self._run_for_config(model, data_root, config, output_model_path)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/olive/passes/pytorch/qlora.py", line 259, in _run_for_config
train_result = trainer.train()
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 1591, in train
return inner_training_loop(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 1892, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 2776, in training_step
loss = self.compute_loss(model, inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/trainer.py", line 2801, in compute_loss
outputs = model(**inputs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/peft/peft_model.py", line 918, in forward
return self.base_model(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 953, in forward
hidden_states = self.transformer(input_ids, past_key_values=past_key_values, attention_mask=attention_mask)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 915, in forward
hidden_states = layer(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 770, in forward
attn_outputs = self.mixer(
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 722, in forward
attn_output = self._forward_self_attn(x, attention_mask)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 621, in _forward_self_attn
return self.inner_attn(qkv, key_padding_mask=key_padding_mask)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/home/ericv/.cache/huggingface/modules/transformers_modules/phi-2/modeling_phi.py", line 377, in forward
scores = scores + causal_mask.to(dtype=scores.dtype)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 14.17 GiB is allocated by PyTorch, and 98.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/1200 [00:17<?, ?it/s]
[2023-12-24 23:04:05,707] [INFO] [engine.py:357:run] Run history for gpu-cpu:
[2023-12-24 23:04:06,050] [INFO] [engine.py:630:dump_run_history] run history:
+----------------------------------+-------------------+-------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================+===================+=============+================+===========+
| c2ac57d9dcdd152c5f5dbe6b31fb29ca | | | | |
+----------------------------------+-------------------+-------------+----------------+-----------+
[2023-12-24 23:04:06,071] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts

I'm running phi-2 finetuning on a lesser GPU (RTX 3060Ti 8GB) and works. It just takes a long time to finish:

(phi-2-env) zcobol@TEXAS:/mnt/c/ai/phi-2$ python finetuning/invoke_olive.py 
[2023-12-25 08:59:57,186] [DEBUG] [engine.py:125:setup_accelerators] Initial execution providers: ['CPUExecutionProvider']
[2023-12-25 08:59:57,186] [DEBUG] [engine.py:143:setup_accelerators] Initial accelerators: ['gpu']
[2023-12-25 08:59:57,187] [DEBUG] [engine.py:164:setup_accelerators] Supported execution providers for device gpu: ['CUDAExecutionProvider', 'TensorrtExecutionProvider', 'CPUExecutionProvider']
[2023-12-25 08:59:57,187] [INFO] [engine.py:181:setup_accelerators] Running workflow on accelerator specs: gpu-cpu
[2023-12-25 08:59:57,213] [DEBUG] [engine.py:482:run_no_search] Running ['qlora'] with no search ...
[2023-12-25 08:59:57,213] [INFO] [engine.py:924:_run_pass] Running pass qlora:QLoRA
[2023-12-25 08:59:58,537] [INFO] [hf_utils.py:218:load_model] Loading Huggingface model from model-cache/microsoft/phi-2
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:47<00:00, 23.74s/it]
[2023-12-25 09:00:48,867] [DEBUG] [hf_utils.py:254:load_huggingface_model_from_task] Loaded model <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> with name_or_path model-cache/microsoft/phi-2
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2023-12-25 09:00:48,971] [INFO] [qlora.py:420:smart_tokenizer_and_embedding_resize] Added 1 new tokens to tokenizer and resized model embedding layer.
[2023-12-25 09:00:48,986] [DEBUG] [qlora.py:388:get_model_tokenizer] Adding LoRA modules
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2023-12-25 09:00:59,652] [INFO] [qlora.py:258:_run_for_config] Running QLoRA fine-tuning
{'loss': 1.7088, 'learning_rate': 0.0001, 'epoch': 1.21}                                                                                                                                                                                                                                
{'loss': 0.9402, 'learning_rate': 0.0001, 'epoch': 2.42}                                                                                                                                                                                                                                
  2%|█████▎                                                                                                                                                                                                                     | 29/1200 [55:06<37:05:12, 114.02s/it]{'loss': 0.781, 'learning_rate': 0.0001, 'epoch': 3.64}                                                                                                                                                                                                               
{'loss': 0.6497, 'learning_rate': 0.0001, 'epoch': 4.85}                                                                                                                                                                                                              
{'loss': 0.5381, 'learning_rate': 0.0001, 'epoch': 6.06}                                                                                                                                                                                                              
{'loss': 0.4499, 'learning_rate': 0.0001, 'epoch': 7.27}                                                                                                                                                                                                              
{'loss': 0.3674, 'learning_rate': 0.0001, 'epoch': 8.48}                                                                                                                                                                                                              
{'loss': 0.3085, 'learning_rate': 0.0001, 'epoch': 9.7}                                                                                                                                                                                                               
{'loss': 0.2549, 'learning_rate': 0.0001, 'epoch': 10.91}                                                                                                                                                                                                             
  8%|█████████████████▏                                                                                                                                                                                                       | 95/1200 [3:00:31<35:00:44, 114.07s/it

nvidia-smi output inside WSL instance:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.02              Driver Version: 536.19       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  | 00000000:02:00.0  On |                  N/A |
| 30%   58C    P2              71W / 200W |   7875MiB /  8192MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

The host system has 32GB of RAM and about 5GB is used by WSL. How much RAM has your host , and how much is reserved for WSL?

@zcobol
Host system:
Windows 11 Home...
16G DDR5 RAM...
RTX 3070 TI 8GB

WSL environment: (default of the previous error)
(phi-2-env) ericv@NITRO:/mnt/c/mnt/c/Users/ericv/source/repos/AI/SamplePHI2$ free -h --giga
total used free shared buff/cache available
Mem: 7.5G 1.0G 6.0G 3.0M 503M 6.3G
Swap: 2.0G 0B 2.0G

There was no .wslconfig file at the default %USER_PROFILE% location...
So I created one tried multiple configs of upping memory limit... (restarted the WSLService service to get the configuration)

  • Settings apply across all Linux distros running on WSL 2
    [wsl2]
  • Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
    memory=12GB
  • Sets the VM to use two virtual processors
    processors=4
    ...
    free -h --giga
    total used free shared buff/cache available
    Mem: 11G 447M 11G 3.0M 395M 11G
    Swap: 3.1G 0B 3.1G
    ...

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 14.17 GiB is allocated by PyTorch, and 98.70 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/1200 [00:12<?, ?it/s]
[2023-12-25 21:52:15,399] [INFO] [engine.py:357:run] Run history for gpu-cpu:
[2023-12-25 21:52:15,448] [INFO] [engine.py:630:dump_run_history] run history:
+----------------------------------+-------------------+-------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================+===================+=============+================+===========+
| c2ac57d9dcdd152c5f5dbe6b31fb29ca | | | | |
+----------------------------------+-------------------+-------------+----------------+-----------+
[2023-12-25 21:52:15,465] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts

Not sure why it works on your system and not mine, or whether I need to upgrade the system memory...

OK - Happy New Years first of all to each of you.
I got back to this (learning a bit more about what QLoRA is about over the holidays) and finally got it to work by changing a setting in the olive config file

./finetuning/olive-config.json
changed:
"source_max_len": 1024,
to:
"source_max_len": 512,

Note: I found this suggestion out by using copilot.microsoft.com...
Although, it also suggested that "gradient_checkpointing": true, ... however, this results in another error that check pointing is not support further down the stack. So I reverted that...
File "/opt/miniconda/envs/phi-2-env/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1827, in gradient_checkpointing_enable
raise ValueError(f"{self.class.name} does not support gradient checkpointing.")
ValueError: PhiForCausalLM does not support gradient checkpointing.

So with the tweaked source_max_len being smaller - it works. (my guess for Olive, this setting is synonymous with batch_size)
Final Output:
(phi-2-env) ericv@NITRO:~/AI/SamplePHI2$ python finetuning/invoke_olive.py
[2024-01-01 12:11:25,770] [DEBUG] [engine.py:125:setup_accelerators] Initial execution providers: ['CPUExecutionProvider']
[2024-01-01 12:11:25,770] [DEBUG] [engine.py:143:setup_accelerators] Initial accelerators: ['gpu']
[2024-01-01 12:11:25,770] [DEBUG] [engine.py:164:setup_accelerators] Supported execution providers for device gpu: ['CUDAExecutionProvider', 'TensorrtExecutionProvider', 'CPUExecutionProvider']
[2024-01-01 12:11:25,770] [INFO] [engine.py:181:setup_accelerators] Running workflow on accelerator specs: gpu-cpu
[2024-01-01 12:11:25,791] [DEBUG] [engine.py:482:run_no_search] Running ['qlora'] with no search ...
[2024-01-01 12:11:25,792] [INFO] [engine.py:924:_run_pass] Running pass qlora:QLoRA
[2024-01-01 12:11:27,149] [INFO] [hf_utils.py:218:load_model] Loading Huggingface model from model-cache/microsoft/phi-2
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:53<00:00, 56.94s/it]
[2024-01-01 12:13:21,643] [DEBUG] [hf_utils.py:254:load_huggingface_model_from_task] Loaded model <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'> with name_or_path model-cache/microsoft/phi-2
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-01-01 12:13:21,981] [INFO] [qlora.py:420:smart_tokenizer_and_embedding_resize] Added 1 new tokens to tokenizer and resized model embedding layer.
[2024-01-01 12:13:22,017] [DEBUG] [qlora.py:388:get_model_tokenizer] Adding LoRA modules
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-01-01 12:13:33,618] [INFO] [qlora.py:258:_run_for_config] Running QLoRA fine-tuning
{'loss': 1.9, 'learning_rate': 0.0001, 'epoch': 0.6}
{'loss': 1.0769, 'learning_rate': 0.0001, 'epoch': 1.19}
{'loss': 0.8712, 'learning_rate': 0.0001, 'epoch': 1.79}
...
{'loss': 0.0062, 'learning_rate': 0.0001, 'epoch': 71.04}
{'loss': 0.0061, 'learning_rate': 0.0001, 'epoch': 71.64}
{'train_runtime': 4195.7947, 'train_samples_per_second': 1.144, 'train_steps_per_second': 0.286, 'train_loss': 0.10114002790302039, 'epoch': 71.64}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1200/1200 [1:09:55<00:00, 3.50s/it]
[2024-01-01 13:23:29,587] [DEBUG] [qlora.py:260:_run_for_config] train_result: TrainOutput(global_step=1200, training_loss=0.10114002790302039, metrics={'train_runtime': 4195.7947, 'train_samples_per_second': 1.144, 'train_steps_per_second': 0.286, 'train_loss': 0.10114002790302039, 'epoch': 71.64})
[2024-01-01 13:23:29,921] [DEBUG] [resource_path.py:157:create_resource_path] Resource path cache/models/11_QLoRA-c2ac57d9dcdd152c5f5dbe6b31fb29ca-ebf2cb49fc53f7d5dc8c54563ff5bd31/output_model/adapter is inferred to be of type folder.
[2024-01-01 13:23:29,990] [DEBUG] [engine.py:904:_run_passes] Signal: None
[2024-01-01 13:23:30,006] [DEBUG] [resource_path.py:157:create_resource_path] Resource path /mnt/c/mnt/c/Users/ericv/source/repos/AI/SamplePHI2/cache/models/11_QLoRA-c2ac57d9dcdd152c5f5dbe6b31fb29ca-ebf2cb49fc53f7d5dc8c54563ff5bd31/output_model/adapter is inferred to be of type folder.
[2024-01-01 13:23:32,912] [INFO] [engine.py:357:run] Run history for gpu-cpu:
[2024-01-01 13:23:32,928] [INFO] [engine.py:630:dump_run_history] run history:
+----------------------------------------------------------------------------+----------------------------------+-------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+============================================================================+==================================+=============+================+===========+
| c2ac57d9dcdd152c5f5dbe6b31fb29ca | | | | |
+----------------------------------------------------------------------------+----------------------------------+-------------+----------------+-----------+
| 11_QLoRA-c2ac57d9dcdd152c5f5dbe6b31fb29ca-ebf2cb49fc53f7d5dc8c54563ff5bd31 | c2ac57d9dcdd152c5f5dbe6b31fb29ca | QLoRA | 4324.18 | |
+----------------------------------------------------------------------------+----------------------------------+-------------+----------------+-----------+
[2024-01-01 13:23:32,932] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts