Codes for Machine Learning course Assignments.
-
Problem Statement:
- Find cluster center (k=2 to 10) using K-Means clustering for every project.
- Compute the value of DB index and silhouette value
- Find the optimal number of cluster using DB index and silhouette value.
- Store your results with single excel file with multiple rows i.e., one row for each project
-
Software used: MATLAB
- Problem Statement:
- Apply 3 different Naive Bayes Classifiers on all data.
- Apply 5-fold cross validation
- Compute the value of F-measure and accuracy for all features and significant features
- Find the best Naive Bayes Classifier and also compare original data with significant features data.
- Store your results with single excel file with multiple rows i.e., one row for each project
- Programming language: Python
- Libraries used: numpy, pandas, scipy, sklearn, matplotlib
-
Problem Statement:
- Apply feature ranking techniques using gini split, information gain, PCA.
- Apply same 3 different Naive Bayes Classifiers on selected features data.
- Compute the value of F-measure and accuracy
- Find the best Naive Bayes Classifier and also compare best sets of features.
- Store your results with single excel file with multiple rows i.e., one row for each project
-
Programming language: Python
-
Libraries used: numpy, pandas, scipy, sklearn, matplotlib, graphviz
- Problem Statement:
- Apply different data sampling techniques like random sampling, upsampling, and Downsampling to handle class imbalance problem.
- Apply logistic regression, Decision tree on selected data.
- Compute the value of F-measure and accuracy and find the best techniques.
- Store your results with single excel file with multiple rows i.e., one row for each project
- You should also validated the null hypothesis like "There is no any significant improvement after applying data sampling techniques"
- Programming language: Python
- Libraries used: numpy, pandas, scipy, sklearn, matplotlib, graphviz