Application of multimodal models for the image recommendation systems

With the development of Internet technologies new sources of information have become available. Social networks play an important role in our life. Users started to upload images of everything they preferred. There was a need to systematize these images. The special algorithm was required and it was called recommendation (or recommender) system. However, it is not perfect. Different tasks require different data to make recommendations. Examples of such data can be descriptive text of the goods or some metrics, which we can obtain. These properties/characteristics are called multimodal information. And the models that use this information, are called multimodal. In this thesis I present the multimodal recommendation model, which consists of two parts. The first part is the image itself. We vectorise it, and obtain different metrics. The second part is the test description of the image. We also vectorise it and add necessary metrics. Then we concatenate these data sets into one and train the model with all the data. Finally for one random image we obtain a certain amount of recommendation images.

Welcome to the GitHub page of my project 'Application of multimodal models for the image recommendation systems'

This repository consists of two parts. Part 1 contains only results of my project. Part 2 contains all steps of my project.

1. Results

Step 1. Download image dataset and text dataset

Step 2. Run the notebok

Results notebook

2. All steps

Step 0. Download image dataset and text dataset

Yandex Image Dataset

Step 1. Building CLIP&BERT vectors

Run these notebooks contineously

As a result, you should have this file CLIP&BERT Dataset

Step 2. Creating clusters for CLIP&BERT embeddings. Run this notebook

Creating clusters notebook

As a result, you should have this dataframe with images and it's candidates according to three nearest clusters Images candidates dataframe

Step 3. Learn CatBoost model

Download this learning dataset Learning pipeline
Run this notebook Learning model notebook

As a result, you should have file with catboost model Catboost model

Step 4. Predict targets for paired images

You should use this dataframe, which you have obtained at step 2 Nearest clusters
Run this notebook Target predicting notebook

As a result, you should have files with candidates for 3 different models: (1) Catboost model (2) CLIP model (3) BERT model

Step 5. Results

Run this notebook and obtain the results

Results notebook

mishafoniakov/multimodal_recommendation