intel-analytics/ipex-llm

Issue with saving and loading low bit BLIP-2 model

Opened this issue · 1 comments

The original BLIP2-OPT-6.7B model takes more than 30GB RAM to load and convert. So I want to save the compressed model then load it directly from another PC with limited RAM. The saving succeeded. But loading failed.

from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model

model_id = "Salesforce/blip2-opt-6.7b" # “Salesforce/blip2-opt-2.7b” 
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)

device = 'xpu'
optimized_model = optimize_model(model, device=device)

model_path =optimized-blip2optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)
$ l optimized-blip2 
total 4.7G
drwxrwxr-x 2 wayne wayne 4.0K Apr 25 16:55 .
drwxrwxr-x 6 wayne wayne 4.0K Apr 26 08:40 ..
-rw-rw-r-- 1 wayne wayne   42 Apr 25 16:54 bigdl_config.json
-rw-rw-r-- 1 wayne wayne  942 Apr 25 16:53 config.json
-rw-rw-r-- 1 wayne wayne  136 Apr 25 16:53 generation_config.json
-rw-rw-r-- 1 wayne wayne 446K Apr 25 16:55 merges.txt
-rw-rw-r-- 1 wayne wayne 4.7G Apr 25 16:54 model.safetensors
-rw-rw-r-- 1 wayne wayne  432 Apr 25 16:55 preprocessor_config.json
-rw-rw-r-- 1 wayne wayne  548 Apr 25 16:55 special_tokens_map.json
-rw-rw-r-- 1 wayne wayne  708 Apr 25 16:55 tokenizer_config.json
-rw-rw-r-- 1 wayne wayne 2.1M Apr 25 16:55 tokenizer.json
-rw-rw-r-- 1 wayne wayne 780K Apr 25 16:55 vocab.json
copied_model = load_low_bit(copied_model, model_path)

2024-04-26 08:39:58,752 - INFO - Converting the current model to sym_int4 format......
2024-04-26 08:39:59,115 - ERROR - 

****************************Usage Error************************
Error no file named pytorch_model.bin found in directory optimized-blip2.
2024-04-26 08:39:59,116 - ERROR - 

****************************Call Stack*************************
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[19], line 1
----> 1 copied_model = load_low_bit(copied_model, 'optimized-blip2')

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py:178](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/optimize.py#line=177), in load_low_bit(model, model_path)
    175     qtype = ggml_tensor_qtype[low_bit]
    176     model = ggml_convert_low_bit(model, qtype=qtype, convert_shape_only=True)
--> 178 resolved_archive_file, is_sharded = extract_local_archive_file(model_path, subfolder="")
    179 if is_sharded:
    180     # For now only shards transformers models
    181     # can run in this branch.
    182     resolved_archive_file, _ = \
    183         get_local_shard_files(model_path,
    184                               resolved_archive_file,
    185                               subfolder="")

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py:83](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/transformers/utils.py#line=82), in extract_local_archive_file(pretrained_model_name_or_path, subfolder, variant)
     81     return archive_file, is_sharded
     82 else:
---> 83     invalidInputError(False,
     84                       f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}"
     85                       " found in directory"
     86                       f" {pretrained_model_name_or_path}.")

File [~/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py:32](http://case-wlc-01.sh.intel.com:8888/home/wayne/.env/ipex-llm/lib/python3.10/site-packages/ipex_llm/utils/common/log4Error.py#line=31), in invalidInputError(condition, errMsg, fixMsg)
     30 if not condition:
     31     outputUserMessage(errMsg, fixMsg)
---> 32     raise RuntimeError(errMsg)

RuntimeError: Error no file named pytorch_model.bin found in directory optimized-blip2.

Hi there, I tried on my arc a770 machine, my env setting is:

transformers=4.31.0
-----------------------------------------------------------------
Name: ipex-llm
Version: 2.1.0b20240421

I first downloaded the Salesforce/blip2-opt-6.7b to my machine

arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/blip2
config.json                       pytorch_model-00003-of-00004.bin  tokenizer_config.json
merges.txt                        pytorch_model-00004-of-00004.bin  tokenizer.json
preprocessor_config.json          pytorch_model.bin.index.json      vocab.json
pytorch_model-00001-of-00004.bin  README.md
pytorch_model-00002-of-00004.bin  special_tokens_map.json

and then used an absolute path to load and transform the model. I did not encounter the issue you mentioned.

from transformers import Blip2Processor, Blip2ForConditionalGeneration
from ipex_llm import optimize_model

model_id = "/mnt/disk1/models/blip2"  
processor = Blip2Processor.from_pretrained(model_id)
model = Blip2ForConditionalGeneration.from_pretrained(model_id)

device = 'xpu'
optimized_model = optimize_model(model, device=device)

model_path = "optimized-blip2"
optimized_model.save_low_bit(model_path )
processor.save_pretrained(model_path)
arda@arda-arc05:/mnt/disk1/models$ ls /mnt/disk1/models/optimized-blip2
bigdl_config.json  preprocessor_config.json  tokenizer_config.json
config.json        pytorch_model.bin         tokenizer.json
merges.txt         special_tokens_map.json   vocab.json

You might want to verify that the model you downloaded is complete. And note that you should download the original pytorch_model.
As for loading converted model, I use the following code:

from ipex_llm.optimize import low_memory_init, load_low_bit
from transformers import Blip2Processor, Blip2ForConditionalGeneration
model_id = "/mnt/disk1/models/optimized-blip2"
with low_memory_init():
    model = Blip2ForConditionalGeneration.from_pretrained(model_id)
model = load_low_bit(model, model_id)
print("Model loaded successfully!")

And no error occurred.

arda@arda-arc05:/mnt/disk1/models$ python blip2.py
/opt/anaconda3/envs/mingyu-llm-gpu/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-29 10:47:15,896 - INFO - intel_extension_for_pytorch auto imported
2024-04-29 10:47:16,064 - INFO - Converting the current model to sym_int4 format......
Model loaded successfully!

You can refer to relevant API in ipex-llm/python/llm/src/ipex_llm/optimize.py at main · intel-analytics/ipex-llm (github.com) to write the code of loading.