NTU-SPMS-MH6151 Data Mining project
- Install requirements with the following command:
pip install -r requirements.txt
- Run the python files with the format
modelling.py --model_name <model_name> --output_file <path>
and save output to the folder./outputs
. For example, to run and save the output for random forest classifier, execute the following command:
python modelling.py --model_name random_forest --output_file outputs/random_forest.txt
- To add oversampling step to the training data, simply add the
--oversampling
option in the command.
python modelling.py --model_name random_forest --output_file outputs/random_forest.txt --oversampling
scripts/modelling.sh && scripts/modelling_oversampling.sh
.\scripts\modelling.bat
.\scripts\modelling_oversampling.bat
python modelling_insights.py > outputs/performance.txt
- Random oversampling and undersampling for imbalanced classification : Link.
- AdaBoost Algorithm: Understand, Implement and Master AdaBoost : Link.
- AdaBoost clearly explained (Josh Starmer) : Link (Youtube).