My progress through the Udacity Data Analyst Nanodegree
In this repository I'll be trying to maintain a relatively presentable copy of my Udacity DanD projects. If you're feeling extra scrutinizing today then go ahead and ignore the earlier projects as I took massive aesthetic liberty to churn out hideous but functional projects quickly so I could get to the juicy learnin'.
Introductory project, very straightforward basic stats on chopstick size preferences among students.
Jupyter Notebook, Python
Looking at the Stroop effect in a small sample and doing some more basic stats.
Jupyter Notebook, Python (numpy, pandas)
Taking the Titanic data set from Kaggle and using numpy and pandas. In the Kaggle comp you are supposed to look at stats as
they relate to passenger survival, but for this exercise I didn't go terribly in depth with survival, preferring to look at other
relationships like Class and Sex, fare age etc. as well as Survival.
Jupyter Notebook, Python (numpy, pandas)
Exported an XML document from Open Street Maps detailing the Hampton Roads area in Virginia. XML file size was over 1GB. Cleaned data, saved as JSON, uploaded to MongoDB and looked at a couple different statistics on the area. This one was a fav!
Jupyter Notebook, Python (numpy, pandas, ElementTree), XML, JSON, MongoDB
Exploratory Data Analysis for the drivendata.org machine learning competition to predict failure of water wells in Tanzania to support the NPO Taarifa.
R, R Markdown, Statistical Analysis, ggplot2