Issue with saving and loading low bit BLIP-2 model
Opened this issue · 1 comments
The original BLIP2-OPT-6.7B model takes more than 30GB RAM to load and convert. So I want to save the compressed model then load it directly from another PC with limited RAM. The saving succeeded. But loading failed.
from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model
model_id = "Salesforce/blip2-opt-6.7b" # “Salesforce/blip2-opt-2.7b”
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)
device = 'xpu'
optimized_model = optimize_model(model, device=device)
model_path = “optimized-blip2”
optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)
$ l optimized-blip2
total 4.7G
drwxrwxr-x 2 wayne wayne 4.0K Apr 25 16:55 .
drwxrwxr-x 6 wayne wayne 4.0K Apr 26 08:40 ..
-rw-rw-r-- 1 wayne wayne 42 Apr 25 16:54 bigdl_config.json
-rw-rw-r-- 1 wayne wayne 942 Apr 25 16:53 config.json
-rw-rw-r-- 1 wayne wayne 136 Apr 25 16:53 generation_config.json
-rw-rw-r-- 1 wayne wayne 446K Apr 25 16:55 merges.txt
-rw-rw-r-- 1 wayne wayne 4.7G Apr 25 16:54 model.safetensors
-rw-rw-r-- 1 wayne wayne 432 Apr 25 16:55 preprocessor_config.json
-rw-rw-r-- 1 wayne wayne 548 Apr 25 16:55 special_tokens_map.json
-rw-rw-r-- 1 wayne wayne 708 Apr 25 16:55 tokenizer_config.json
-rw-rw-r-- 1 wayne wayne 2.1M Apr 25 16:55 tokenizer.json
-rw-rw-r-- 1 wayne wayne 780K Apr 25 16:55 vocab.json
copied_model = load_low_bit(copied_model, model_path)
2024-04-26 08:39:58,752 - INFO - Converting the current model to sym_int4 format......
2024-04-26 08:39:59,115 - ERROR -
****************************Usage Error************************
Error no file named pytorch_model.bin found in directory optimized-blip2.
2024-04-26 08:39:59,116 - ERROR -
****************************Call Stack*************************
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 copied_model = load_low_bit(copied_model, 'optimized-blip2')
File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py:178](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py#line=177), in load_low_bit(model, model_path)
175 qtype = ggml_tensor_qtype[low_bit]
176 model = ggml_convert_low_bit(model, qtype=qtype, convert_shape_only=True)
--> 178 resolved_archive_file, is_sharded = extract_local_archive_file(model_path, subfolder="")
179 if is_sharded:
180 # For now only shards transformers models
181 # can run in this branch.
182 resolved_archive_file, _ = \
183 get_local_shard_files(model_path,
184 resolved_archive_file,
185 subfolder="")
File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py:83](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py#line=82), in extract_local_archive_file(pretrained_model_name_or_path, subfolder, variant)
81 return archive_file, is_sharded
82 else:
---> 83 invalidInputError(False,
84 f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}"
85 " found in directory"
86 f" {pretrained_model_name_or_path}.")
File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py:32](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py#line=31), in invalidInputError(condition, errMsg, fixMsg)
30 if not condition:
31 outputUserMessage(errMsg, fixMsg)
---> 32 raise RuntimeError(errMsg)
RuntimeError: Error no file named pytorch_model.bin found in directory optimized-blip2.
Hi there, I tried on my arc a770 machine, my env setting is:
transformers=4.31.0
-----------------------------------------------------------------
Name: ipex-llm
Version: 2.1.0b20240421
I first downloaded the Salesforce/blip2-opt-6.7b to my machine
arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/blip2
config.json pytorch_model-00003-of-00004.bin tokenizer_config.json
merges.txt pytorch_model-00004-of-00004.bin tokenizer.json
preprocessor_config.json pytorch_model.bin.index.json vocab.json
pytorch_model-00001-of-00004.bin README.md
pytorch_model-00002-of-00004.bin special_tokens_map.json
and then used an absolute path to load and transform the model. I did not encounter the issue you mentioned.
from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model
model_id = "/mnt/disk1/models/blip2"
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)
device = 'xpu'
optimized_model = optimize_model(model, device=device)
model_path = "optimized-blip2"
optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)
arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/optimized-blip2
bigdl_config.json preprocessor_config.json tokenizer_config.json
config.json pytorch_model.bin tokenizer.json
merges.txt special_tokens_map.json vocab.json
You might want to verify that the model you downloaded is complete. And note that you should download the original pytorch_model.
As for loading converted model, I use the following code:
from ipex_llm.optimize import low_memory_init, load_low_bit
from transformers import Blip2Processor, Blip2ForConditionalGeneration
model_id = "/mnt/disk1/models/optimized-blip2"
with low_memory_init():
model = Blip2ForConditionalGeneration.from_pretrained(model_id)
model = load_low_bit(model, model_id)
print("Model loaded successfully!")
And no error occurred.
arda@arda-arc05:/mnt/disk1/models$ python blip2.py
/opt/anaconda3/envs/mingyu-llm-gpu/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2024-04-29 10:47:15,896 - INFO - intel_extension_for_pytorch auto imported
2024-04-29 10:47:16,064 - INFO - Converting the current model to sym_int4 format......
Model loaded successfully!
You can refer to relevant API in ipex-llm/python/llm/src/ipex_llm/optimize.py at main · intel-analytics/ipex-llm (github.com) to write the code of loading.