custom transformers issue
Closed this issue · 8 comments
Very interesting work so that I can learn a lot! But I have a question for the author when I execute the following code to install a custom transformer:
cd dep/transformers_custom/transformers-4.31.0
pip install -e .
The custom transformer installs successfully, but seems to be missing so many init.py files that the module cannot be found?
Thanks for your interest and nice word!
I could not really parse the problem here. Could you kindly provide more detailed logs? Thanks!
Thanks to the author for his positive reply, I will explain the doubts in detail:
I want to run easy inference code at inference.ipynb,,which I first copied completely onto a test.py and run it,but I found that it had the following error:
Traceback (most recent call last):
File "D:\CLIP-Guided-Decoding\text.py", line 5, in <module>
from gen.clip_guided import generate_w_clip
File "D:\CLIP-Guided-Decoding\gen\clip_guided.py", line 8, in <module>
from transformers.generation.stopping_criteria import StoppingCriteria, StoppingCriteriaList
File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\generation\stopping_criteria.py", line 9, in <module>
from ..utils import add_start_docstrings, logging
ImportError: cannot import name 'add_start_docstrings' from 'transformers.utils' (unknown location)
Process finished with exit code 1
I carefully examined the author's custom transformer-4.31.0 and found that there seems to be no "__init__.py" file compared to the official transformer-4.31.0,which I speculate is the cause of the above problems.
So I copied "transformers/utils/__init__.py" from the official transformer-4.31.0 and put it in the same place under your custom transformer-4.31.0. Then the above error disappears but the following error appears, apparently due to the lack of "__init__ _.py" which causes the module to not be found.
Traceback (most recent call last):
File "D:\CLIP-Guided-Decoding\text.py", line 2, in <module>
from gen.clip_guided import generate_w_clip
File "D:\CLIP-Guided-Decoding\gen\clip_guided.py", line 8, in <module>
from transformers.generation.stopping_criteria import StoppingCriteria, StoppingCriteriaList
File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\generation\stopping_criteria.py", line 9, in <module>
from ..utils import add_start_docstrings, logging
File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\utils\__init__.py", line 20, in <module>
from .. import __version__
ImportError: cannot import name '__version__' from 'transformers' (unknown location)
Process finished with exit code 1
Looking forward to the author's reply !
Thanks for the detailed log.
Yes, I just noticed this is due to my file pattern issue in the .gitignore file. I have fixed it in the recent commit and it should be fine now. Please update the code and try again. Thanks!
Thanks to the author for the positive answer, I will try the new code later. Before I do, I have one more question.
How do I load a local tokenizer without an external network? I tried to make the following changes under "lib/clip_utils.py" :
class CLIPModel:
def __init__(self, model_name='ViT-SO400M-14-SigLIP-384', model_pretrain='webli', device='cuda'):
self.model, _, self.preprocess = open_clip.create_model_and_transforms(model_name, pretrained=model_pretrain, device=device)
self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\ViT-SO400M-14-SigLIP-384")
self.model.to(device)
self.model.eval()
self.device = device
It works with the official transformer-4.37.2 (open-clip-torch-2.24.0), but there are errors in your custom transformer-4.31.0
Traceback (most recent call last):
File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 92, in get_tokenizer
config = _get_hf_config(model_name)['model_cfg']
File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 78, in _get_hf_config
config_path = download_pretrained_from_hf(model_id, filename='open_clip_config.json', cache_dir=cache_dir)
File "E:\condaenv\CGD\lib\site-packages\open_clip\pretrained.py", line 522, in download_pretrained_from_hf
cached_file = hf_hub_download(model_id, filename, revision=revision, cache_dir=cache_dir)
File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'E:\ViT-SO400M-14-SigLIP-384'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\CLIP-Guided-Decoding\text.py", line 31, in <module>
clip_scorer = CLIPModel(model_pretrain="E:\\bigdata\ViT-SO400M-14-SigLIP-384\\open_clip_pytorch_model.bin",device=device)
File "D:\CLIP-Guided-Decoding\lib\clip_utils.py", line 14, in __init__
self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\bigdata\\ViT-SO400M-14-SigLIP-384")
File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 94, in get_tokenizer
tokenizer = HFTokenizer(
File "E:\condaenv\CGD\lib\site-packages\open_clip\tokenizer.py", line 407, in __init__
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\models\auto\tokenization_auto.py", line 699, in from_pretrained
raise ValueError(
ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.
Process finished with exit code 1
Thanks to the author for the positive answer, I will try the new code later. Before I do, I have one more question.
How do I load a local tokenizer without an external network? I tried to make the following changes under "lib/clip_utils.py" :
class CLIPModel: def __init__(self, model_name='ViT-SO400M-14-SigLIP-384', model_pretrain='webli', device='cuda'): self.model, _, self.preprocess = open_clip.create_model_and_transforms(model_name, pretrained=model_pretrain, device=device) self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\ViT-SO400M-14-SigLIP-384") self.model.to(device) self.model.eval() self.device = device
It works with the official transformer-4.37.2 (open-clip-torch-2.24.0), but there are errors in your custom transformer-4.31.0
Traceback (most recent call last): File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 92, in get_tokenizer config = _get_hf_config(model_name)['model_cfg'] File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 78, in _get_hf_config config_path = download_pretrained_from_hf(model_id, filename='open_clip_config.json', cache_dir=cache_dir) File "E:\condaenv\CGD\lib\site-packages\open_clip\pretrained.py", line 522, in download_pretrained_from_hf cached_file = hf_hub_download(model_id, filename, revision=revision, cache_dir=cache_dir) File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn validate_repo_id(arg_value) File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id raise HFValidationError( huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'E:\ViT-SO400M-14-SigLIP-384'. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "D:\CLIP-Guided-Decoding\text.py", line 31, in <module> clip_scorer = CLIPModel(model_pretrain="E:\\bigdata\ViT-SO400M-14-SigLIP-384\\open_clip_pytorch_model.bin",device=device) File "D:\CLIP-Guided-Decoding\lib\clip_utils.py", line 14, in __init__ self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\bigdata\\ViT-SO400M-14-SigLIP-384") File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 94, in get_tokenizer tokenizer = HFTokenizer( File "E:\condaenv\CGD\lib\site-packages\open_clip\tokenizer.py", line 407, in __init__ self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name) File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\models\auto\tokenization_auto.py", line 699, in from_pretrained raise ValueError( ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. Process finished with exit code 1
Sorry, this problem seems to be essentially caused by the previous problem, I will try again on the new code
Hello, I tried to run your new code, the original error has been solved, but there seems to be a new error. I will describe the situation in as much detail as possible below:
I notice that the author uses' device_map='auto, but transformer-4.31.0 doesn't seem to support CLIPVisionModel yet.
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:10<00:00, 5.05s/it]
Traceback (most recent call last):
File "/root/project/text.py", line 25, in <module>
model, vis_processor, tokenizer = load_model(model_name=model_name, model_type=model_type, device=device)
File "/root/project/lib/utils.py", line 92, in load_model
tokenizer, model, vis_processor, context_len = load_pretrained_model(model_path, model_base, model_name, device=device)
File "/root/project/LLaVA/llava/model/builder.py", line 157, in load_pretrained_model
vision_tower.load_model(device_map=device_map)
File "/root/project/LLaVA/llava/model/multimodal_encoder/clip_encoder.py", line 30, in load_model
self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name, device_map=device_map)
File "/root/project/dep/transformers_custom/transformers-4.31.0/src/transformers/modeling_utils.py", line 2804, in from_pretrained
raise ValueError(
ValueError: CLIPVisionModel does not support `device_map='auto'`. To implement support, the modelclass needs to implement the `_no_split_modules` attribute.
Process finished with exit code 1
Compared with newer versions of the transformer if the difference is transformers/models/clip/modeling_clip py
class CLIPVisionModel(CLIPPreTrainedModel):
config_class = CLIPVisionConfig
main_input_name = "pixel_values"
no_split_modules = ["CLIPEncoderLayer"] #difference
def __init__(self, config: CLIPVisionConfig):
super().__init__(config)
self.vision_model = CLIPVisionTransformer(config)
# Initialize weights and apply final processing
self.post_init()
Thanks for the follow-up!
It seems this issue is due to some mismatched packages as current LLaVA is installed with transformer-4.37.2. But never mind, the code modification with transformers is actually quite simple. I will update with some instruction later.
It seems LLaVA has changed some interface code (e.g., the output_ids not containing input_ids anymore). To adapt to latest version of LLaVA, I might still need some time to change the code. If it is available, could you use LLaVA v1.1.3 first to run the code first? It is the exact model version in our experiments.
git clone --depth 1 --branch v1.1.3 https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip instal -e .
After installing the LLaVA, you could then install the custom transformer. Thanks!