- Swapnil Sinha
- Pragnya Pathak
- Xin Pan
- Avanti Bhandarkar
- Yuyang Wu
--- Data/
| +-- makeup_original.csv
| +-- cleaned_makeup.csv
| +-- withUSE.csv
| +-- ingredients.csv
| +-- ingredients.txt
| +-- colorants.csv
| +-- colorants.txt
--- Scripts/
| +-- utils.py
| +-- preprocessing.py
| +-- lda.py
| +-- models_SVM_TfIdf.ipynb
| +-- models_SVM_GPT3.ipynb
--- ECE143_Group17_Project Proposal.pdf
--- ECE143_Team17_Presentation.pdf
--- ECE143_ProductCategorization_Visualizations.ipynb
--- LDAvis.html
--- README.md
Data
stores all datasets for analysis.makeup_original.csv
- dataset from Heroku /makeup APIcleaned_makeup.csv
- dataset after preprocessingwithUSE.csv
- cleaned dataset with USE word embeddings savedingredients.csv / ingredients.txt
- FDA approved cosmetic ingredients datasetcolorants.csv / colorants.txt
- FDA approved cosmetic colorants dataset
Scripts
stores all Python scripts.utils.py
contains helper functions for cleaning data and to perform certain feature engineering operations.preprocessing.py
contains all preprocessing functions used to preprocess the description column from makeup_original.csvmodels_SVM_TfIdf.py
contains SVM + Tfidf model for categorizationmodels_SVM_GPT3.py
contains SVM + GPT3 model for categorization
ECE143_Group17_Project Proposal.pdf
is our project proposalECE143_Team17_Presentation.pdf
is the pdf of our presentationECE143_ProductCategorization_Visualizations.ipynb
is our visualization notebook, LDA modelling is excluded (check Scripts/lda.py)LDAvis.html
HTML visualization of Latent Dirichlet Allocation based Topic ModellingREADME.md
Make sure you have Python (version 3.9 or lower) installed on your machine. Then, follow these steps:
-
Clone the repository:
git clone https://github.com/avanti-bhandarkar/ECE143_FinalProject_ProductCategorization
-
Install dependencies:
Install libraries mentioned in the 3rd party modules section below. Please note that some of these libraries may require the installation of other supplementary modules.
- Pandas - 1.5.3
- Numpy - 1.23.5
- Matplotlib - 3.7.1
- Seaborn - 0.12.2
- NLTK - 3.8.1
- SpaCy - 3.6.1
- Gensim - 4.3.2
- Sklearn - 1.2.2
- pyLDAvis - 2.1.2
- Wordcloud - 1.9.2
- Tensorflow - 2.14.0
- OpenAI - 0.27.2