[MIEB] laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K not working
Closed this issue · 2 comments
Muennighoff commented
INFO:mteb.cli:Running with parameters: Namespace(model='laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K', task_types=None, categories=None, tasks=['BLINKIT2IRetrieval'], languages=None, device=None, output_folder='/data/niklas/mieb/results-mieb-final', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=4, overwrite=False, save_predictions=False, func=<function run at 0x7fd9afa416c0>)
Traceback (most recent call last):
File "/env/lib/conda/gritkto4/bin/mteb", line 8, in <module>
sys.exit(main())
File "/data/niklas/mieb/mteb/mteb/cli.py", line 346, in main
args.func(args)
File "/data/niklas/mieb/mteb/mteb/cli.py", line 115, in run
model = mteb.get_model(args.model, args.model_revision, device=device)
File "/data/niklas/mieb/mteb/mteb/models/__init__.py", line 57, in get_model
model = meta.load_model(**kwargs)
File "/data/niklas/mieb/mteb/mteb/model_meta.py", line 95, in load_model
model: Encoder | EncoderWithQueryCorpusEncode = loader(**kwargs) # type: ignore
File "/data/niklas/mieb/mteb/mteb/models/datacomp_clip.py", line 24, in __init__
self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1049, in from_pretrained
raise ValueError(
ValueError: Unrecognized model in laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth
isaac-chung commented
Once we implement open clip models, this can also be resolved as this particular model does not have config.json
to be compatible with the transformers lib. Will work on this next.
isaac-chung commented
Closing this as the open clip PR is merged. Feel free to reopen if the issue persists.