Clothing Similarity Comparison

Clothes similarity search provides ranked recommendations based on the description of the clothing provided from the database. This repository contains the source code for:

  • Data Scraper
  • Sentence Encoder
  • Similarity Function

Data Scraper

Data scraper makes use of the data available on H&M website.

The scraped data is further feature-engineered and preprocessed to generate a wordSoup for representational encoding and is stored in the column cleaned_text

Encoder

Encoder uses Tfidf-Vectorizer to generate sentence embeddings for every product description of size 1 x MAX_FEATURES (default = 1000).

Similarity Function

Similarity Function utilizes cosine similarity by scikit-learn to generate a similarity measure between the product description and the database and provide n-ranked results to the user.

Installation

To use util functions and scripts such as the scraper and the encoders, clone this repository to your local environment to get started:

git clone https://github.com/catlover75926/Clothes-Similarity-Comparison.git

After creating a virtual environment in the cloned repository, install the dependencies as:

pip install -r requirements-script.txt