In order to complete training as a Data Scientist, we developped this project as a team of 4 people.
For this [contest organized by ENS](https://challengedata.ens.fr/participants/challenges/35/), we worked on the classification of e-commerce articles by developping and aggregating several models.
The data provided for each article included both some text(title and description) and a picture.
Visit our Streamlit demo here
Features:
- Predict the classification of a random article (or even an article loaded from Amazon/Rakuten, or manually inputted)
- Calculate the probablities using your own combination of all 3 models
- Explore the dataset with a dynamic EDA
- ...
- ...
Page Preview:
99 000 articles (85 000 in train + 14 000 in test) and 27 categories
Each article includes:
text data (2 fields: description and title)
one picture
15 most frequent words from the description field for category/class #1560
15 random images for category/class #10
tf-idf for category/class 1281
frequency of regular expressions for each category
% of pixels in green for each category