Machine Learning Labs
Labs for CSE 5324 - Machine Learning in Python
Lab 1 - Exploring Table Data
Exploring the Global Terrorism Database for features and relationships that could be useful to law enforcement in the middle of an investigation into a terror threat.
Lab 2 - Exploring Text Data
Looking at Stack Overflow questions and answers for properties of a question that result in shorter answer times.
Lab 3 - Exploring Image Data
Exploring CIFAR-10 images of cars and trucks for distinguishing features between the two classes of images. We perform dimensionality reduction on the data set using PCA and Kernel PCA. We also compare Daisy and Gabor Kernels feature extraction methods to determine which is better suited for our data set.
Lab 4 - Extending Logistic Regression
Implementing logistic regression in an one-vs-all fashion to perform multi-class classification on the Global Terrorism Database. Our hope is that local law enforcement would be able to use this model to more accurately tailor their training programs to the attack profile for their city. We will compare our implementation to scikit-learn.
Lab 5 - Evaluation and Multi-Layer Perceptron
Implementing a multi-layer perceptron to perform classification of truck and automobile images in the CIFAR-10 data set.
Lab 6 - Wide and Deep Networks
Using Keras and TensorFlow to implement a classification network with a wide branch and a deep branch. Two deep branch architectures are proposed. The best architecture is choosen and compared to a simple multi-layer perceptron.
Lab 7 - Convolutional Neural Networks
Using Keras to develop a convolutional neural network for classification of car and truck images from the CIFAR-10 data set. Two CNNs are developed and compared to a simple multi-layer perceptron.
Lab 8 - Recurrent Neural Networks
Using Keras to develop a recurrent neural network architecture for classification of Stack Overflow questions. Based on the question title, body, code, and post time, we try to determine how long it will take for a question to get answered. Two RNNs are developed using LSTM and GRU recurrent architectures.
ICA 1
Using numpy and scikit-learn to perform linear classification on a diabetes data set.
ICA 2
Experimenting with linear and non-linear support vector machines to assess the accuracy of classifying the subject of an image in the faces in the wild data set.