This is the repository for my portfolio projects for the course CS 4375 @ utdallas.
Here is a link to the overview document prefacing different machine learning branches algorithms.
The link emulates a variety of functions: sum, mean, median, range, covariance, and correlation found in R using C++. The data is from a given csv "Boston.csv" which is in the same folder. In the data file, there are two columns labeled rm
and medv
, for both of these, the sum, mean, median, and range are found; and the covariance and correlation are calculated between both of them.
The link points to linear and logistic regression model implementations on two sets of data, doing exploratory data analysis, and reporting of the findings, along with visualizations.
The logistic regression and naive bayes algorithms are implemented via c++, exploring how those models work from scratch. In addition, summary document explores and reviews the turn out those algorithms with respect to the titanic dataset, as well as discussing generative classifiers versus discriminative classifiers and reproducible research in machine learning.
The searching for similarity summary is available at the previous link. The part I was responsible for - in a group of four - was the regression portion, for which the R notebook for that is here.
The kernel and ensemble files explore support vector machines for regression (Notebook 1) and classification (Notebook 2), as well as Decision Trees, Random Forest, AdaBoost, and XGBoost ensemble methods (Notebook 3). In addition, the narrative document summary generalizes the actions that were taken on this assignment while exploring more about the inner workings, strengths, and weaknesses of these algorithms.