/EECS4412

Data Mining

Primary LanguageJupyter Notebook

1. REQUIREMENTS
=========================================================================================
The system must meet the following requirements in order to run the program:
- Python 3.7
- Pandas 0.90 or newer
- Scikit-learn 0.21 or newer


2. PREPROCESS.PY script
=========================================================================================
This script is provided with the purpose of meeting project requirements to generate
a pre-processed training dataset in CSV and ARFF formats for Weka.

Weka is NOT used in this project.

Usage: python3 preproces.py


3. TRAINER.PY application
=========================================================================================
The program was tested on 'red.eecs.yorku.ca' and executed properly without the need to
install any additional libraries.

IMPORTANT: Please ensure that the 'test3.csv' and 'train3.csv' datasets are located in the same directory
as the program (trainer.py) or change the train_dataset_path and test_dataset_path
configuration variables accordingly.

Usage: python3 trainer.py

APPENDIX A: Sample run on red.eecs.yorku.ca
==========================================================================================
red 304 % ls
EECS 4412 Project.pdf  preprocess.py  readme.txt  stop_words.txt  test3.csv  train3.csv  trainer.py
red 305 % python3 trainer.py
==== TRAINING DATASET CLASSES ===
=== Training a Linear SVM Classifier ===
=== Training a Logistic Regression Classifier ===
=== Training a Random Forest Classifier ===

[ ... redacted ...]

All tasks completed.
red 306 %