Zero-shot image classification model for Russian language
RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts). Our model is based on ConvNeXt-tiny and DistilRuBert-tiny, and is supported by extensive research zero-shot transfer, computer vision, natural language processing, and multimodal learning.
Our model achieved 46.62% top1 and 73.18% top5 zero-shot accuracy on CIFAR100
ONNX conversion and speed testing
Install rucliptiny module and requirements first. Use this trick
!gdown -O ru-clip-tiny.pkl https://drive.google.com/uc?id=1-3g3J90pZmHo9jbBzsEmr7ei5zm3VXOL
!pip install git+https://github.com/cene555/ru-clip-tiny.git
Download CLIP image from repo
!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
- Import libraries
from rucliptiny.predictor import Predictor
from rucliptiny import RuCLIPtiny
import torch
torch.manual_seed(1)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
- Load model
model = RuCLIPtiny()
model.load_state_dict(torch.load('ru-clip-tiny.pkl'))
model = model.to(device).eval()
- Use predictor to get probabilities
predictor = Predictor()
classes = ['диаграмма', 'собака', 'кошка']
text_probs = predictor(model=model, images_path=["CLIP.png"],
classes=classes, get_probs=True,
max_len=77, device=device)
NVIDIA Tesla K80 (Google Colab session)
TORCH | batch | encode_image | encode_text | total |
---|---|---|---|---|
RuCLIPtiny | 2 | 0.011 | 0.004 | 0.015 |
RuCLIPtiny | 8 | 0.011 | 0.004 | 0.015 |
RuCLIPtiny | 16 | 0.012 | 0.005 | 0.017 |
RuCLIPtiny | 32 | 0.014 | 0.005 | 0.019 |
RuCLIPtiny | 64 | 0.013 | 0.006 | 0.019 |
We would like to express my gratitude to Sber AI for the grants provided, for which research was carried out, as part of the Artificial Intelligence International Junior Contest (AIIJC)