/CSI_416

This page is for students who have taken "Pattern Recognition Lab (CSI 416)" this trimester under Dr. Dewan M. Farid. // [ Section : SA & SB. ]

Learning Resources

Text Books :

  • Data Mining: Concepts and Techniques, 3E by J. Han et al. Download
  • Data Mining: Practical Machine Learning Tools and Techniques, 3E by Ian H. Witten et al. View
  • Pattern Recognition and Machine Learning, by Christopher Bishop View

Learn WEKA :

  • GUI based learning :

  • Implementation based learning :

Learn scikit-learn :

  • Implemented in Python (scikit-learn) by Rafsanjani Muhammod Go Github
  • LazyProgrammer Go Github
  • Hands-on Machine Learning with Scikit-Learn and TensorFlow Go Github

Blogs :

  • An Introduction to Data Mining by Dr. Saed Sayad Go
  • Analytics Vidhya Go
  • Soft Computing and Intelligent Information Systems Go
  • Comparism between classifiers Go, Go

Coming soon ... :)

Public datasets for Analytics

  • UCI Machine Learning Repository Go
  • KEEL Go
  • AnalyticsVidhya Go
  • Kaggle (This is mainly a contest site.) Go
  • Public datasets for Machine Learning Go
  • Algorithmia Go
  • Springboar Go

Syllabus

Key Terms :

  1. Features / Attributres
  2. Feature-values & Attributre-values
  3. Class & Class-Attributes
  4. Instances / Records / Vectors / Tuples
  5. Two-class dataset & Multi-class dataset/Multi-label datasets (when number of class-values is gretter than 2.)
  6. High-dimensional (When number of feature is gretter than 10)
  7. Univariate, Bivariate & Multivariate dataset Go, Go
  8. Balanced dataset vs Imbalanced dataset
  9. Overfitting & Underfitting of a dataset
  10. Supervised learning vs. Unsupervised learning Go
  11. Classification, Regression, Clustering
  12. Bias–variance tradeoff Go
  13. Noisy Datasets & how to remove noise ?
  14. Anomaly Detection Go

Preprocessing Datasets :

  • Data cleaning Go
  • Remove duplicate elements
  • Handle missing elements (Can you calculate : Mean, Median, Mode, Standard Deviation etc.?)
  • Feature Scaling or Feature Normalization (Can you calculate distance using : Euclid, Manhattan, Minkowski etc.?)

Classification :

  • Rule Classifiers
    • ZeroR Classifier
    • OneR Classifier Go & Go
  • Logistic Regression
  • KNN Classifier
  • Support Vector Classifier ( Kernels : Linear, Polynomial, Gaussian, Sigmoid, etc. )
  • Naive Bayes Classifier
  • Decision Tree Classifier
    • Gini
    • ID3
    • C4.5 / C5.0 / J48
  • Ensemble Learning
    • Bagging Classifier
    • Boosting Classifier (AdaBoost, Gradient Boosting)
    • Random Forest Classifier
  • Introduction to Deep Learning (ANN, RNN, CNN, SOM, Autoencoders )

Regression :

  • Linear Regression (Simple & Multiple)
  • Polynomial Regression
  • Support Vector Regression (SVR)
  • Decision Tree Regression
  • Random Forest Regression

Clustering :

  • KMeans
  • Hierarchical (Agglomerative, Divisive)

Imbalanced Learning : Go

  • Majority class vs Minority class
  • Re-sampling : Over-sampling, Under-sampling
    • Over-sampling algorithms : ADASYN, SMOTE, Random Over-sampling
    • Under-sampling algorithms : Random Under-sampling

Features Selection :

  • Filter Approach (LDA)
  • Wrapper Approach

Features Generation:

  • Embeded Approach
  • Principal Component Analysis (PCA)

Performance Measures : Go

  • Understand Confusion Matrix
  • Calculate : Accuracy, Error, Sensitivity, Specificity, Precision, Recall
  • ROC Curve & AUPR Curve

Course Materials

Course Schedule

  • Week #1 :
    • Introduction to Pattern Recognition,
    • Current Researh Trend,
    • Introduction to WEKA
  • Week #2 :
    • Hands-on practice on WEKA GUI
      • Understand what are the .CSV & .ARFF file
      • Data Vizualization
      • Classifier design
      • Use different machine learning algorithms
    • Evaluation options
      • use training dataset
      • supplied test dataset
      • cross-validation (KFold=10)
      • split dataset (2/3 train & 1/3 test)
    • Confusion Matrix
      • TP, FP, FN, FP
      • Performance Measure : Accuracy, Error, TPR, FPR, F-Score etc.
      • weighted mean
    • Assignment #1 : Choose all (25) datasets from WEKA & submit report on it (Week #03).
  • Week #3 :
    • Introduction to WEKA implementation in Java.
      • Assignment #2 : Two huge datasets provides & submit report on it (Week #04)
  • Week #4 :
    • Details on WEKA implementation in Java.
      • Actual vs Prediction
      • Evaluation options
      • Feature Reduction
  • Week #5 : Ensemble Learning
  • Week #6 : Midterm Exam (Classifiers : based on your both Lab & Theory courses.)
  • Week #7 : Clustering
  • Week #8 : Data Analysis with scikit-learn (Python)-I
    • Loading the datasets (using pandas, numpy)
    • Features scalling
    • Machine Learning Classifiers
    • Evaluation Matrix
    • Assignment #3 : Datasets will provide.
  • Week #9 : Data Analysis with scikit-learn (Python)-II
    • Problem solving
    • Draw curve on scikit-learn (eg. ROC Curve, AUC Curve)
    • More tricks (imblearn)
  • Week #10 : Data Analysis with scikit-learn (Python)-III
    • Introduction to Kaggle competetion Go
    • Problem solving
    • Assignment #4 : A dataset will provide.
  • Week #11 : Presentation based on datasets. (Individual)
  • Week #12 : Final Exam.