Data Science Portfolio

By Graeme West

This repository contains a variety of data science projects and explorations that I’ve done for personal development.

You can browse the list below, or click the Binder button below to launch the whole repository there. Note that this will probably only work for the Python projects with Jupyter notebooks

List of projects

Word Frequency in Moby Dick (DataCamp)

This project uses the nltk toolkit to describe approximate word frequencies in the text of Herman Melville’s novel Moby Dick, accounting for English stop words.

Technologies and techniques used:

Python, nltk, matplotlib, Jupyter notebooks.

Links:
- View on GitHub
- View on Binder
A Network Analysis of Game of Thrones (DataCamp)

A graph analysis project. By calculating different measures of network effects for major characters from the Game of Thrones books, this project seeks to determine the most important characters in each book. The measures used are the Google PageRank algorithm, betweenness centrality, and degree centrality.

Technolgies and techniques used:

Python, networkx, pandas, Jupyter notebooks.

Links:
- View on GitHub
- View on Binder
Reducing Traffic Mortality in the USA (DataCamp)

A project involving exploratory data analysis (EDA) and cluster analysis using the KMeans algorithm. The subject matter is CSV data covering mortality rates in traffic data.

Technologies and techniques used:

Python, scikit-learn, pandas, matplotlib, KMeans clustering, EDA

Links:
- View on GitHub
- View on Binder
Bad Passwords and the NIST Guidelines (DataCamp)

A project involving manipulation of Pandas dataframes to introduce new columns denoting particular negative properties of the passwords. By creating Pandas series containing the results of Boolean expressions on the contents, we can select rows and create subsets.

Technologies and techniques used:

Python, pandas

Links:
- View on GitHub
- View on Binder
Classifying Song Genres from Audio Data (DataCamp)

In this project, we use the EchoNest dataset to predict the genre of songs from certain metadata characteristics, for example 'tempo' and 'energy'. We compare and evaluate the outputs of logistic regression and a decision tree classifier using K-fold cross-validation.

Technologies and techniques used:

Python, logistic regression, decision trees, K-fold cross validation, scikit-learn
- View on GitHub
- View on Binder

capncodewash/DataSciencePortfolio

Data Science Portfolio

List of projects