/Machine_Learning

Georgia Tech - OMSCS - CS7641 - Machine Learning Repository

Primary LanguagePython

CS7641 - Machine Learning

https://github.com/ezerilli/CS7641-Machine_Learning

SETTING UP THE ENVIRONMENT 👨🏻‍💻👨🏻‍💻👨🏻‍💻

The following steps lead to setup the working environment for CS7641 - Machine Learning in the OMSCS program. 👨🏻‍💻‍📚‍‍‍‍

Installing the conda environment is a ready-to-use solution to be able to run python scripts without having to worry about the packages and versions used. Alternatively, you can install each of the packages in requirements.yml on your own independently with pip or conda.

  1. Start by installing Conda for your operating system following the instructions here.

  2. Now install the environment described in requirements.yaml:

conda env create -f requirements.yml
  1. To activate the environment run:
conda activate CS7641
  1. Once inside the environment, if you want to run a python file, run:
python my_file.py
  1. To deactivate the environment run:
conda deactivate
  1. During the semester I may need to add some new packages to the environment. So, to update it run:
conda env update -f requirements.yml

ASSIGNMENT1 - SUPERVISED LEARNING ‍🔥🔥🔥

This assignment aims to explore 5 Supervised Learning algorithms (k-Nearest Neighbors, Support Vector Machines, Decision Trees, AdaBoost and Neural Networks) and to perform model complexity analysis and learning curves while comparing their performances on two interesting datasets: the Wisconsin Diagnostic Breast Cancer (WDBC) and the Handwritten Digits Image Classification (the famous MNIST).

The assignment consists of two parts:

  • experiment 1, producing validation curves, learning curves and performances on the test set, for each of the algorithms, on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.

  • experiment 2, producing validation curves, learning curves and performances on the test set, for each of the algorithms, on the Handwritten Digits Image Classification (MNIST) dataset.

In order to run the experiments, run:

cd Supervised_Learning
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and hyperparameter optimizations. However, they have already been saved into the images directory. Theory, results and experiments are discussed in the report (not provided here due to Georgia Tech's Honor Code).

ASSIGNMENT2 - RANDOMIZED OPTIMIZATION 🔥🔥🔥

This assignment aims to explore some algorithms in Randomized Optimization, namely Random-Hill Climbing (RHC), Simulated Annealing (SA), Genetic Algorithms (GA) and Mutual-Information Maximizing Input Clustering (MIMIC), while comparing their performances on 3 interesting discrete optimisation problems: the Travel Salesman Problem, Flip Flop and 4-Peaks. Moreover, RHC, SA and GA will later be compared to Gradient Descent and Backpropagation on a (nowadays) fundamental optimization problem: training complex Neural Networks.

The assignment consists of four parts:

  • experiment 1, producing complexity and performances curves for the Travel Salesman problem.
  • experiment 2, producing complexity and performances curves for Flip Flop.
  • experiment 3, producing complexity and performances curves for 4-Peaks.
  • experiment 4, producing complexity and performances curves for Neural Networks training.

In order to run the experiments, run:

cd Randomized_Optimization
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and parameters optimizations. However, they have already been saved into the images directory. Theory, results and experiments are discussed in the report (not provided here due to Georgia Tech's Honor Code).

ASSIGNMENT3 - UNSUPERVISED LEARNING 🔥🔥🔥

This assignment aims to explore some algorithms in Unsupervised Learning, namely Principal Components Analysis (PCA), Kernel PCA (KPCA), Independent Components Analysis (ICA), Random Projections (RP), k-Means and Gaussian Mixture Models (GMM), while comparing their performances on 2 interesting dataset: the Wisconsin Diagnostic Breast Cancer (WDBC) and the Handwritten Digits Image Classification (the famous MNIST). Moreover, their contribution to Neural Networks in the supervised setting will be assessed.

The assignment consists of two parts:

  • experiment 1, producing curves for dimensionality reduction, clustering and neural networks with unsupervised techniques on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset.

  • experiment 2, producing curves for dimensionality reduction, clustering and neural networks with unsupervised techniques on the Handwritten Digits Image Classification (MNIST) dataset.

In order to run the experiments, run:

cd Unsupervised_Learning
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and parameters optimizations. However, they have already been saved into the images directory. Theory, results and experiments are discussed in the report (not provided here due to Georgia Tech's Honor Code).

ASSIGNMENT4 - MARKOV DECISION PROCESSES 🔥🔥🔥

This assignment aims to explore some algorithms in Reinforcement Learning, namely Value Iteration (VI), Policy Iteration (PI) and Q-Learning, while comparing their performances on 2 interesting MDPs: the Frozen Lake environment from OpenAI gym and the Gambler's Problem from Sutton and Barto.

The assignment consists of two parts:

  • experiment 1, producing curves for VI, PI and Q-Learning on the Frozen Lake environment from OpenAI gym.

  • experiment 2, producing curves for VI, PI and Q-Learning on the Gambler's Problem from Sutton and Barto.

In order to run the experiments, run:

cd Markov_Decision_Processes
python run_experiments.py

Figures will show up progressively. It takes a while to perform all the experiments and parameters optimizations. However, they have already been saved into the images directory. Theory, results and experiments are discussed in the report (not provided here due to Georgia Tech's Honor Code).

REFERENCES

  • [1] National Cancer Institute. https://www.cancer.gov. Last accessed: 2019-09-20.
  • [2] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017.
  • [3] F. Pedregosa, G. Varoquaux, Gramfort, and al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  • [4] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. Openml: Networked science in machine learning. SIGKDD Explorations, 15(2):49–60, 2013.
  • [5] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
  • [6] Thomas M. Mitchell. Machine Learning. McGraw-Hill, New York, NY, USA, 1997.
  • [7] Jeremy S. De Bonet, Charles L. Isbell, Jr., and Paul Viola. MIMIC: Finding optima by esti- mating probability densities. In Proceedings of the 9th International Conference on Neural Information Processing Systems, pages 424–430, Cambridge, MA, USA, 1996. MIT Press.
  • [8] G Hayes. mlrose: Machine Learning, Randomized Optimization and SEarch package for python. https://github.com/gkhayes/mlrose, 2019. Accessed: 10/09/2019.
  • [9] I K Fodor. A survey of dimension reduction techniques. Technical report, Lawrence Livermore National Lab., CA (US), 2002.