Clothes similarity search provides ranked recommendations based on the description of the clothing provided from the database. This repository contains the source code for:
- Data Scraper
- Sentence Encoder
- Similarity Function
Data scraper makes use of the data available on H&M website.
The scraped data is further feature-engineered and preprocessed to generate a wordSoup for representational encoding and is stored in the column cleaned_text
Encoder uses Tfidf-Vectorizer to generate sentence embeddings for every product description of size 1 x MAX_FEATURES (default = 1000).
Similarity Function utilizes cosine similarity by scikit-learn to generate a similarity measure between the product description and the database and provide n-ranked results to the user.
To use util functions and scripts such as the scraper and the encoders, clone this repository to your local environment to get started:
git clone https://github.com/catlover75926/Clothes-Similarity-Comparison.git
After creating a virtual environment in the cloned repository, install the dependencies as:
pip install -r requirements-script.txt