This repository contains Data Science projects I worked on in the past. The projects are all in one repository to enable ease of navigation for some users.
Here are the projects so far:
AirBnB Seattle Listings 2016 - Implemented a K-means clustering model using AirBnB property listing data to identify clusters with attributes most associated with the highest and lowest priced listings. Additionally, I published a blog post on Medium.com and created Tableau visualizations to disclose findings.
Deep Learning (Image Classification) - Implemented an image classification application using a deep learning model on a dataset of images of flower species. Used the trained model to classify new images of flower species. PyTorch, a package for Python, was employed.
Predicting Customer Churn with PySpark - Used PySpark to engineer relevant features and build machine learning models to predict churn with an imagined digital music service dataset; explained the project in a Medium.com post with CRISP-DM methodology.
Querying Data with Transact-SQL - Final Assessment - This is the final assignment for the "Querying Data with Transact-SQL" class by EdX and Microsoft. I wrote SQL statements that included TOP, OFFSET, FETCH, CASE, JOIN, UNION, LEFT, ISNUMERIC, UPPER, CREATE TABLE, PIVOT, INSERT, UPDATE, etc.
SQL for Data Science - Module 3 Coding Assignment - This is the final assignment for a "SQL for Data Science" class by Coursera/UC Davis: University of California. The SQL statements used include CASE, INNER JOIN, LIMIT, GROUP BY, ORDER BY, etc.
Supervised Learning - Used Scikit-Learn to predict whether an individual earned an adequate income amount to be a potential charity donor; implemented and evaluated algorithms such as Random Forest, SVM, and AdaBoost; determined the field attributes with the highest feature importances to reduce training and prediction time.
Unsupervised Learning (Clustering) - Utilized clustering algorithms such as K-means and Principle Component Analysis to compare a business's customer data to external demographic data to identify over and under-represented populations in the customer data.