embeddings-benchmark/mteb

[MIEB] laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K not working

Closed this issue · 2 comments

INFO:mteb.cli:Running with parameters: Namespace(model='laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K', task_types=None, categories=None, tasks=['BLINKIT2IRetrieval'], languages=None, device=None, output_folder='/data/niklas/mieb/results-mieb-final', verbosity=2, co2_tracker=True, eval_splits=None, model_revision=None, batch_size=4, overwrite=False, save_predictions=False, func=<function run at 0x7fd9afa416c0>)
Traceback (most recent call last):
  File "/env/lib/conda/gritkto4/bin/mteb", line 8, in <module>
    sys.exit(main())
  File "/data/niklas/mieb/mteb/mteb/cli.py", line 346, in main
    args.func(args)
  File "/data/niklas/mieb/mteb/mteb/cli.py", line 115, in run
    model = mteb.get_model(args.model, args.model_revision, device=device)
  File "/data/niklas/mieb/mteb/mteb/models/__init__.py", line 57, in get_model
    model = meta.load_model(**kwargs)
  File "/data/niklas/mieb/mteb/mteb/model_meta.py", line 95, in load_model
    model: Encoder | EncoderWithQueryCorpusEncode = loader(**kwargs)  # type: ignore
  File "/data/niklas/mieb/mteb/mteb/models/datacomp_clip.py", line 24, in __init__
    self.model = AutoModel.from_pretrained(model_name, trust_remote_code=True).to(
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/env/lib/conda/gritkto4/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1049, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth

Once we implement open clip models, this can also be resolved as this particular model does not have config.json to be compatible with the transformers lib. Will work on this next.

Closing this as the open clip PR is merged. Feel free to reopen if the issue persists.