/CS4375_CP_Portfolio

This is the repository for my portfolio projects for the course CS 4375 @ utdallas.

Primary LanguageHTML

CS4375_CP_Portfolio

This is the repository for my portfolio projects for the course CS 4375 @ utdallas.

Overview

Here is a link to the overview document prefacing different machine learning branches algorithms.

C++ Data Exploration

The link emulates a variety of functions: sum, mean, median, range, covariance, and correlation found in R using C++. The data is from a given csv "Boston.csv" which is in the same folder. In the data file, there are two columns labeled rm and medv, for both of these, the sum, mean, median, and range are found; and the covariance and correlation are calculated between both of them.

Linear Models

The link points to linear and logistic regression model implementations on two sets of data, doing exploratory data analysis, and reporting of the findings, along with visualizations.

ML Algorithms From Scratch

The logistic regression and naive bayes algorithms are implemented via c++, exploring how those models work from scratch. In addition, summary document explores and reviews the turn out those algorithms with respect to the titanic dataset, as well as discussing generative classifiers versus discriminative classifiers and reproducible research in machine learning.

Searching for Similarity

The searching for similarity summary is available at the previous link. The part I was responsible for - in a group of four - was the regression portion, for which the R notebook for that is here.

Kernel and ensemble

The kernel and ensemble files explore support vector machines for regression (Notebook 1) and classification (Notebook 2), as well as Decision Trees, Random Forest, AdaBoost, and XGBoost ensemble methods (Notebook 3). In addition, the narrative document summary generalizes the actions that were taken on this assignment while exploring more about the inner workings, strengths, and weaknesses of these algorithms.