With the development of Internet technologies new sources of information have become available. Social networks play an important role in our life. Users started to upload images of everything they preferred. There was a need to systematize these images. The special algorithm was required and it was called recommendation (or recommender) system. However, it is not perfect. Different tasks require different data to make recommendations. Examples of such data can be descriptive text of the goods or some metrics, which we can obtain. These properties/characteristics are called multimodal information. And the models that use this information, are called multimodal. In this thesis I present the multimodal recommendation model, which consists of two parts. The first part is the image itself. We vectorise it, and obtain different metrics. The second part is the test description of the image. We also vectorise it and add necessary metrics. Then we concatenate these data sets into one and train the model with all the data. Finally for one random image we obtain a certain amount of recommendation images.
Welcome to the GitHub page of my project 'Application of multimodal models for the image recommendation systems'
This repository consists of two parts. Part 1 contains only results of my project. Part 2 contains all steps of my project.
Run these notebooks contineously
As a result, you should have this file CLIP&BERT Dataset
As a result, you should have this dataframe with images and it's candidates according to three nearest clusters Images candidates dataframe
- Download this learning dataset Learning pipeline
- Run this notebook Learning model notebook
As a result, you should have file with catboost model Catboost model
-
You should use this dataframe, which you have obtained at step 2 Nearest clusters
-
Run this notebook Target predicting notebook
As a result, you should have files with candidates for 3 different models: (1) Catboost model (2) CLIP model (3) BERT model
Run this notebook and obtain the results