31 data science, machine learning, and data engineering projects I completed in 2020. To view each project, simply click on the title of the project and it will take you to the corresponding Jupyter notebook.
Profitable Google Play and Apple App Profiles
An analysis to find the most profitable categories of free apps on the Apple App Store and Google Play store.
Python
An analysis of Ask HN and Show HN posts from Hacker News to determine what type of posts receive the most attention and at what time.
Python
An analysis of used car listings from the German classifieds site eBay Kleineanzeigen.
Python
Pandas
NumPy
Visualizing Earnings Based on College Majors
An exploration of earnings of individuals after graduation based on their college majors, and a look at some statistics for each major.
Python
Pandas
Matplotlib
Visualizing the Gender Gap in College Degrees
An exploration of the gender gap in college degrees across the US.
Python
Pandas
Matplotlib
Clean and Analyze Employee Exit Surveys
The cleaning and analysis of exit survey data from employees of the Department of Education, Training, and Employment (DETE), and the Technical and Further Education Body (TAFE) of Queensland, Australia.
Python
Pandas
NumPy
Matplotlib
Analyzing NYC High School Data
A look at whether standardized tests like the SAT are unfair to certain demographics by investigating the correlations between SAT scores in New York City high schools.
Python
Pandas
NumPy
Matplotlib
Regex
An analysis of Star Wars survey data from fans of Star Wars movies.
Python
Pandas
NumPy
Matplotlib
Exploration of the CIA World Factbook database that contains demographic information for every country.
SQL
Use SQL to answer business questions using a database that contains information about a fictional digital music store that is contained within 11 tables.
Python
SQL
Popular Data Science Questions
An analysis to determine the best data science content to write about for an education company that creates data science books, online articles, videos, or interactive text-based platforms, all based on data that we'll extract from the Data Science Stack Exchange.
Python
SQL
Pandas
NumPy
Matplotlib
Seaborn
Investigating Fandango Movie Ratings
An analysis of movie ratings to determine whether or not Fandango has changed their biased rating system in 2016.
Python
Pandas
NumPy
Matplotlib
Finding the Best Markets to Advertise In
An analysis of survey data from new coders to determine the best markets to advertise a company's online programming courses in.
Python
Pandas
Matplotlib
Seaborn
Mobile App for Lottery Addiction
Work through probability calculations to contribute to the development of a mobile app that aims to prevent and treat lottery addiction by helping people better estimate their chances of winning.
Python
Pandas
Building a Spam Filter with Naive Bayes
Learn about the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages that classifies new messages as spam or non-spam with an accuracy greater than 95%.
Python
Pandas
Regex
Perform hypothesis testing to see if there are any good potential strategies for winning Jeopardy.
Python
Pandas
NumPy
Regex
SciPy
Use the k-nearest neighbors algorithm to predict a car's market price using data containing technical attributes for various cars.
Python
Pandas
NumPy
Matplotlib
scikit-learn
Explore ways to build and improve a linear regression model by working with housing data for the city of Ames, Iowa from 2006 to 2010.
Python
Pandas
NumPy
Matplotlib
scikit-learn
regex
Work with historical data from the S&P500 Index to develop a linear regression model that predicts future S&P500 prices.
Python
Pandas
scikit-learn
Predict the number of bikes that people rent in a given hour by creating several machine learning models––linear regression, decision tree, random forest––and evaluating their performance.
Python
Pandas
NumPy
Matplotlib
scikit-learn
A look at all the necessary steps when attempting a Kaggle competition. Here we'll work with the most popular Kaggle competition for beginners and predict which passengers survived the sinking of the Titanic.
Python
Pandas
NumPy
Matplotlib
scikit-learn
Building a Handwritten Digits Classifier
Build models that can classify handwritten digits. Explore image classification, observe the limitations of traditional machine learning models for image classification, and improve some neural networks for image classification.
Python
Pandas
NumPy
Matplotlib
scikit-learn
Building Fast Queries on a CSV
Create a class with methods that answer business questions about online inventory, while focussing on time and space complexity of algorithms, preprocessing data to speed up the algorithms, efficiently sorting data and searching that data, and using efficient algorithms.
Python
Building a Database for Crime Reports
Create a database from scratch using Boston crime data, create user groups, and assign proper privileges to those groups.
Python
SQL
Psycopg2
CSV
Postgres
Practice Optimizing Dataframes and Processing in Chunks
Work with large financial lending dataset in chunks and optimize the memory usage.
Python
Pandas
NumPy
Analyzing Startup Fundraising Deals from Crunchbase
An analysis of startup fundraising deals using a large database from Crunchbase.com.
Python
Pandas
SQL
An analysis of 54 megabytes of Wikipedia data by implementing a grep function to search textual data.
Python
Pandas
OS
Multiprocessing
MapReduce
A statistical analysis of large quantity of historical stock market data from Yahoo Finance.
Python
Pandas
Pickle
Evaluating Numerical Expressions
Use a stack data structure to implement a function that can evaluate complex numerical expressions stored as a string.
Python
Implementing a Key-Value Database
Create a fully functional save-to-disk key-value store using a b-tree data structure.
Python
Build a hacker news pipeline from a JSON API that will filter, clean, aggregate, and summarize data.
Python
io
JSON
Thank you for checking out my work! Please don't hesitate to contact me if you're interested in collaborating on a project, have a virtual chat, or if you're in Berlin and want to grab a ☕️