This project is a text classification model for categorizing e-commerce product descriptions into different categories such as Household, Books, Electronics, and Clothing & Accessories.
- Python (3.x recommended)
- Pip (Python package installer)
-
Clone the repository:
git clone https://github.com/your-username/e-commerce-text-classification.git
-
Navigate to the project directory:
cd e-commerce-text-classification
-
Install dependencies:
pip install -r requirements.txt
To run the Streamlit app locally:
streamlit run stream_app.py
This will start the app, and you can access it in your browser at http://localhost:8501.
- Enter the text you want to classify in the provided text area.
- Click the "Classify" button.
- The app will display the predicted category for the entered text.
The text classification model is trained using various algorithms, including Support Vector Classifier (SVC), k-Nearest Neighbors (KNN), Random Forest, and Multinomial Naive Bayes. Among these algorithms, the Support Vector Classifier performed well in terms of accuracy and generalization.
The training data for the model is sourced from the Kaggle E-commerce Text dataset. This dataset contains product descriptions from various e-commerce categories.
To convert words into numerical vectors, the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer is employed. TF-IDF is a widely used technique in natural language processing that reflects the importance of words in a document relative to a collection of documents.
The model is trained to categorize product descriptions into different classes, including Household, Books, Electronics, and Clothing & Accessories.
If you would like to contribute to the project, please follow our Contribution Guidelines.
This project is licensed under the MIT License.