d-ailin/CLIP-Guided-Decoding

custom transformers issue

Closed this issue · 8 comments

Very interesting work so that I can learn a lot! But I have a question for the author when I execute the following code to install a custom transformer:

cd dep/transformers_custom/transformers-4.31.0
pip install -e .

The custom transformer installs successfully, but seems to be missing so many init.py files that the module cannot be found?

Thanks for your interest and nice word!

I could not really parse the problem here. Could you kindly provide more detailed logs? Thanks!

Thanks to the author for his positive reply, I will explain the doubts in detail:

I want to run easy inference code at inference.ipynb,,which I first copied completely onto a test.py and run it,but I found that it had the following error:

Traceback (most recent call last):
  File "D:\CLIP-Guided-Decoding\text.py", line 5, in <module>
    from gen.clip_guided import generate_w_clip
  File "D:\CLIP-Guided-Decoding\gen\clip_guided.py", line 8, in <module>
    from transformers.generation.stopping_criteria import StoppingCriteria, StoppingCriteriaList
  File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\generation\stopping_criteria.py", line 9, in <module>
    from ..utils import add_start_docstrings, logging
ImportError: cannot import name 'add_start_docstrings' from 'transformers.utils' (unknown location)

Process finished with exit code 1

I carefully examined the author's custom transformer-4.31.0 and found that there seems to be no "__init__.py" file compared to the official transformer-4.31.0,which I speculate is the cause of the above problems.

So I copied "transformers/utils/__init__.py" from the official transformer-4.31.0 and put it in the same place under your custom transformer-4.31.0. Then the above error disappears but the following error appears, apparently due to the lack of "__init__ _.py" which causes the module to not be found.

Traceback (most recent call last):
  File "D:\CLIP-Guided-Decoding\text.py", line 2, in <module>
    from gen.clip_guided import generate_w_clip
  File "D:\CLIP-Guided-Decoding\gen\clip_guided.py", line 8, in <module>
    from transformers.generation.stopping_criteria import StoppingCriteria, StoppingCriteriaList
  File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\generation\stopping_criteria.py", line 9, in <module>
    from ..utils import add_start_docstrings, logging
  File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\utils\__init__.py", line 20, in <module>
    from .. import __version__
ImportError: cannot import name '__version__' from 'transformers' (unknown location)

Process finished with exit code 1

Looking forward to the author's reply !

Thanks for the detailed log.

Yes, I just noticed this is due to my file pattern issue in the .gitignore file. I have fixed it in the recent commit and it should be fine now. Please update the code and try again. Thanks!

Thanks to the author for the positive answer, I will try the new code later. Before I do, I have one more question.

How do I load a local tokenizer without an external network? I tried to make the following changes under "lib/clip_utils.py" :

class CLIPModel:
    def __init__(self, model_name='ViT-SO400M-14-SigLIP-384', model_pretrain='webli', device='cuda'):

        self.model, _, self.preprocess = open_clip.create_model_and_transforms(model_name, pretrained=model_pretrain, device=device)
        self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\ViT-SO400M-14-SigLIP-384")
        self.model.to(device)
        self.model.eval()
        self.device = device

It works with the official transformer-4.37.2 (open-clip-torch-2.24.0), but there are errors in your custom transformer-4.31.0

Traceback (most recent call last):
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 92, in get_tokenizer
    config = _get_hf_config(model_name)['model_cfg']
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 78, in _get_hf_config
    config_path = download_pretrained_from_hf(model_id, filename='open_clip_config.json', cache_dir=cache_dir)
  File "E:\condaenv\CGD\lib\site-packages\open_clip\pretrained.py", line 522, in download_pretrained_from_hf
    cached_file = hf_hub_download(model_id, filename, revision=revision, cache_dir=cache_dir)
  File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'E:\ViT-SO400M-14-SigLIP-384'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\CLIP-Guided-Decoding\text.py", line 31, in <module>
    clip_scorer = CLIPModel(model_pretrain="E:\\bigdata\ViT-SO400M-14-SigLIP-384\\open_clip_pytorch_model.bin",device=device)
  File "D:\CLIP-Guided-Decoding\lib\clip_utils.py", line 14, in __init__
    self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\bigdata\\ViT-SO400M-14-SigLIP-384")
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 94, in get_tokenizer
    tokenizer = HFTokenizer(
  File "E:\condaenv\CGD\lib\site-packages\open_clip\tokenizer.py", line 407, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
  File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\models\auto\tokenization_auto.py", line 699, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.

Process finished with exit code 1

Thanks to the author for the positive answer, I will try the new code later. Before I do, I have one more question.

How do I load a local tokenizer without an external network? I tried to make the following changes under "lib/clip_utils.py" :

class CLIPModel:
    def __init__(self, model_name='ViT-SO400M-14-SigLIP-384', model_pretrain='webli', device='cuda'):

        self.model, _, self.preprocess = open_clip.create_model_and_transforms(model_name, pretrained=model_pretrain, device=device)
        self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\ViT-SO400M-14-SigLIP-384")
        self.model.to(device)
        self.model.eval()
        self.device = device

It works with the official transformer-4.37.2 (open-clip-torch-2.24.0), but there are errors in your custom transformer-4.31.0

Traceback (most recent call last):
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 92, in get_tokenizer
    config = _get_hf_config(model_name)['model_cfg']
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 78, in _get_hf_config
    config_path = download_pretrained_from_hf(model_id, filename='open_clip_config.json', cache_dir=cache_dir)
  File "E:\condaenv\CGD\lib\site-packages\open_clip\pretrained.py", line 522, in download_pretrained_from_hf
    cached_file = hf_hub_download(model_id, filename, revision=revision, cache_dir=cache_dir)
  File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)
  File "E:\condaenv\CGD\lib\site-packages\huggingface_hub\utils\_validators.py", line 160, in validate_repo_id
    raise HFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'E:\ViT-SO400M-14-SigLIP-384'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\CLIP-Guided-Decoding\text.py", line 31, in <module>
    clip_scorer = CLIPModel(model_pretrain="E:\\bigdata\ViT-SO400M-14-SigLIP-384\\open_clip_pytorch_model.bin",device=device)
  File "D:\CLIP-Guided-Decoding\lib\clip_utils.py", line 14, in __init__
    self.tokenizer = open_clip.get_tokenizer(model_name="hf-hub:E:\\bigdata\\ViT-SO400M-14-SigLIP-384")
  File "E:\condaenv\CGD\lib\site-packages\open_clip\factory.py", line 94, in get_tokenizer
    tokenizer = HFTokenizer(
  File "E:\condaenv\CGD\lib\site-packages\open_clip\tokenizer.py", line 407, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
  File "D:\CLIP-Guided-Decoding\dep\transformers_custom\transformers-4.31.0\src\transformers\models\auto\tokenization_auto.py", line 699, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.

Process finished with exit code 1

Sorry, this problem seems to be essentially caused by the previous problem, I will try again on the new code

Hello, I tried to run your new code, the original error has been solved, but there seems to be a new error. I will describe the situation in as much detail as possible below:

I notice that the author uses' device_map='auto, but transformer-4.31.0 doesn't seem to support CLIPVisionModel yet.

Loading checkpoint shards: 100%|██████████████████| 2/2 [00:10<00:00,  5.05s/it]
Traceback (most recent call last):
  File "/root/project/text.py", line 25, in <module>
    model, vis_processor, tokenizer = load_model(model_name=model_name, model_type=model_type, device=device)
  File "/root/project/lib/utils.py", line 92, in load_model
    tokenizer, model, vis_processor, context_len = load_pretrained_model(model_path, model_base, model_name, device=device)
  File "/root/project/LLaVA/llava/model/builder.py", line 157, in load_pretrained_model
    vision_tower.load_model(device_map=device_map)
  File "/root/project/LLaVA/llava/model/multimodal_encoder/clip_encoder.py", line 30, in load_model
    self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name, device_map=device_map)
  File "/root/project/dep/transformers_custom/transformers-4.31.0/src/transformers/modeling_utils.py", line 2804, in from_pretrained
    raise ValueError(
ValueError: CLIPVisionModel does not support `device_map='auto'`. To implement support, the modelclass needs to implement the `_no_split_modules` attribute.

Process finished with exit code 1

Compared with newer versions of the transformer if the difference is transformers/models/clip/modeling_clip py

class CLIPVisionModel(CLIPPreTrainedModel):
    config_class = CLIPVisionConfig
    main_input_name = "pixel_values"

    no_split_modules = ["CLIPEncoderLayer"]                  #difference

    def __init__(self, config: CLIPVisionConfig):
        super().__init__(config)
        self.vision_model = CLIPVisionTransformer(config)
        # Initialize weights and apply final processing
        self.post_init()

Thanks for the follow-up!

It seems this issue is due to some mismatched packages as current LLaVA is installed with transformer-4.37.2. But never mind, the code modification with transformers is actually quite simple. I will update with some instruction later.

It seems LLaVA has changed some interface code (e.g., the output_ids not containing input_ids anymore). To adapt to latest version of LLaVA, I might still need some time to change the code. If it is available, could you use LLaVA v1.1.3 first to run the code first? It is the exact model version in our experiments.

git clone --depth 1 --branch v1.1.3 https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip instal -e .

After installing the LLaVA, you could then install the custom transformer. Thanks!