Pinned Repositories
24_hour_gyms
annual_medical_bill_predictor
Machine learning model using TPOT library to find the most optimal model for predicting a clients annual medical expenses in order to lead to decisions on premium pricing
Movie_Recomendations_Pyspark
This report stores my Pyspark code for me practicing using Spark with the pyspark library using a cluster setup on Hadoop. I then use the ALS library to set up a simple user-based recomedation engine
Salary_predictions
Machine learning project using a comparison of different models for predicting potential employee salaries
SparkML-Regression
Uses Scala and Spark's MLlib library to predict yearly customer spending based on fake historical data
Star_Trek_TNG_TextAnalysis
An in depth character analysis of the Star Trek: The Next Generation series using the corpus of episode transcripts.
wine-web-app
harveybsmith's Repositories
harveybsmith/wine-web-app
harveybsmith/annual_medical_bill_predictor
Machine learning model using TPOT library to find the most optimal model for predicting a clients annual medical expenses in order to lead to decisions on premium pricing
harveybsmith/Movie_Recomendations_Pyspark
This report stores my Pyspark code for me practicing using Spark with the pyspark library using a cluster setup on Hadoop. I then use the ALS library to set up a simple user-based recomedation engine
harveybsmith/Salary_predictions
Machine learning project using a comparison of different models for predicting potential employee salaries
harveybsmith/SparkML-Regression
Uses Scala and Spark's MLlib library to predict yearly customer spending based on fake historical data
harveybsmith/Star_Trek_TNG_TextAnalysis
An in depth character analysis of the Star Trek: The Next Generation series using the corpus of episode transcripts.
harveybsmith/Analyzing_NYC_HighSchool_data
One of the most controversial issues in the U.S. educational system is the efficacy of standardized tests, and whether they're unfair to certain groups. Given our prior knowledge of this topic, investigating the correlations between SAT scores and demographics might be an interesting angle to take. We could correlate SAT scores with factors like race, gender, income, and more. The SAT, or Scholastic Aptitude Test, is an exam that U.S. high school students take before applying to college. Colleges take the test scores into account when deciding who to admit, so it's fairly important to perform well on it. The test consists of three sections, each of which has 800 possible points. The combined score is out of 2,400 possible points (while this number has changed a few times, the data set for our project is based on 2,400 total points). Organizations often rank high schools by their average SAT scores. The scores are also considered a measure of overall school district quality. New York City makes its data on high school SAT scores available online, as well as the demographics for each high school.
harveybsmith/Apriori-Recommendation-Algortihm-for-Supermarket-transactions-
This project uses an apriori algorithm from supermarket transaction data from a store in France to recommend product placement for higher sales revenue
harveybsmith/Bank-Churning-Neural-Net
Using a Neural Network to analyze why a percentage of customers are leaving a bank
harveybsmith/BenSmithPortfolio
My personal professional portfolio
harveybsmith/Black_Friday_Analysis_R
An exploratory analysis in R of the different variables involved in the purchasing patterns of consumers on Black Friday in three different kinds of cities
harveybsmith/BrainSize_Machine_Learning
calculate a regression line to predict head size vs. brain weight.
harveybsmith/CIA_statements_Word_Counter
This repo contains a dataset of CIA statements and a python file containing a function finding the most common words with a length greater than 5 and takes a year as input
harveybsmith/deploying-machine-learning-models
Example Repo for the Udemy Course "Deployment of Machine Learning Models"
harveybsmith/Diabetes_Prediction
Streamlit webapp featuring a machine learning classification model for positive or negative for diabetes based on dataset of the Pima Indians
harveybsmith/Fairbnb
harveybsmith/Good-Credit-or-Bad-Credit
Classifying customers as either good or bad for loans based and dealing with minority classes in classification
harveybsmith/Gun_Deaths
Exploratory data analysis of gun related deaths in the United States from 2012 - 2014
harveybsmith/Matplotlib_Pymaceuticals
harveybsmith/Mercari_challenge_using_NLP
harveybsmith/Multiple_Linear_Regression_50_Startups
A collection of machine learning projects I have done
harveybsmith/Scala-Spark-Projects
This is a collection of data wrangling with data-frames and ML projects I have put together for practice
harveybsmith/Sentiment-Analysis
harveybsmith/Spam-Detection-NLP
Spam Detection using a simple Natural Language Machine Learning Pipeline
harveybsmith/Testing-Libraries
harveybsmith/twitter-stream-sentiment-lambda
This is a simple lambda function to use in AWS taking in a stream of tweets and passing them the AWS Comprehend
harveybsmith/Wind-Energy-Predictor
harveybsmith/Wine_Quality_Prediction
harveybsmith/World_Series_prediction_2019
A comparison of Machine Learning Models for predicting the World Series champion based on regular season stats
harveybsmith/Yelp_Reviews
Natural Language Processing of Yelp Reviews to predict ratings