By Graeme West
This repository contains a variety of data science projects and explorations that I’ve done for personal development.
You can browse the list below, or click the Binder button below to launch the whole repository there. Note that this will probably only work for the Python projects with Jupyter notebooks
-
Word Frequency in Moby Dick (DataCamp)
This project uses the nltk toolkit to describe approximate word frequencies in the text of Herman Melville’s novel Moby Dick, accounting for English stop words.
Technologies and techniques used:
Python, nltk, matplotlib, Jupyter notebooks.
Links:
-
A Network Analysis of Game of Thrones (DataCamp)
A graph analysis project. By calculating different measures of network effects for major characters from the Game of Thrones books, this project seeks to determine the most important characters in each book. The measures used are the Google PageRank algorithm, betweenness centrality, and degree centrality.
Technolgies and techniques used:
Python, networkx, pandas, Jupyter notebooks.
Links:
-
Reducing Traffic Mortality in the USA (DataCamp)
A project involving exploratory data analysis (EDA) and cluster analysis using the KMeans algorithm. The subject matter is CSV data covering mortality rates in traffic data.
Technologies and techniques used:
Python, scikit-learn, pandas, matplotlib, KMeans clustering, EDA
Links:
-
Bad Passwords and the NIST Guidelines (DataCamp)
A project involving manipulation of Pandas dataframes to introduce new columns denoting particular negative properties of the passwords. By creating Pandas series containing the results of Boolean expressions on the contents, we can select rows and create subsets.
Technologies and techniques used:
Python, pandas
Links:
-
Classifying Song Genres from Audio Data (DataCamp)
In this project, we use the EchoNest dataset to predict the genre of songs from certain metadata characteristics, for example 'tempo' and 'energy'. We compare and evaluate the outputs of logistic regression and a decision tree classifier using K-fold cross-validation.
Technologies and techniques used:
Python, logistic regression, decision trees, K-fold cross validation, scikit-learn