Pinned Repositories
Amazon_Vine_Analysis
Bellybutton_Bacteria_Diversity
The purpose of this challenge was to explore the importance of data visualization to data analysists in communicating findings and conclusions. Although the topic is a bit strange, the outcome is a clear demonstration of the power these tools hold for data visualization. JavaScript is used to create attractive, accessible and interactive data visualizations. The JavaScript data visualization library Plotly.js is used to provide the interactivity needed to increase the user's comprehension of the data, drawing the same conclusions as the data analyst. To demonstrate this, a website was created that offers interactive analysis of a dataset containing information on the biodiversity of bacteria strains found in the human belly button. The hypothesis is that there might be a bacterial strain found in the belly button that can synthesize the proteins necessary to make improbable beef grown in a lab taste like real beef.
Bikesharing
The purpose of this analysis was to provide a group of shareholders an analysis of New York City ridesharing data so that they can provide a thorough proposal to investors for a bike sharing program in Des Moines, Iowa. Tableau was used to create several visualizations of a very large dataset of bike share data that includes trip location, trip duration, and rider type and gender. A story was created in Tableau that includes each of the visualizations.
Credit_Risk_Analysis
The purpose of this analysis was to compare several different machine learning models to determine which is best at predicting credit risk. The dataset used came from LendingClub, a peer-to-peer lending services company. Credit risk is an inherently unbalanced classification problem. Good loans greatly outnumber risky loans. This is why we are using 6 different machine learning techniques to train and evaluate models with unbalanced classes. The techniques fall under 4 different categories and are imported from the imbalanced-learn library. Under oversampling, we employ RandomOverSampler and SMOTE algorithms. Under undersampling we employ the ClusterCentroids algorithm. We use a combination of the over and undersampling techniques by employing the SMOTEENN algorithm. And finally, under Ensemble learning techniques we employ BalancedRandomForestClassifier and EasyEnsembleClassifier algorithms. These last two are fairly new and reduce bias. The scikit-learn library is used to test and train the models and to compare all of the techniques, producing a balanced acuracy score and precision and recall scores to determine if they are suitable techniques at predicting credit risk.
Cryptocurrencies
Mapping_Earthquakes
MechaCar_Statistical_Analysis
The purpose of this project was to analyze a fictitious auto manufacturing company's production problems. There are troubles that are blocking the manufacturing team's progress. R is used to perform multiple linear regression analysis, collect summary statistics, run t-tests and design a statistical study to compare vehicle performance.
Neural_Network_Charity_Analysis
Alphabet Soup's business team is looking to predict where to make investments. With our knowledge of machine learning and neural networks, the purpose of this project is to use the provided dataset and help create a binary classifier capable of predicting whether applicants will be successful if funded by Alphabet Soup. The dataset contains 34,000 organizations that have received Alphabet Soup funding. First, for Deliverable 1, a dataframe was created and variables were considered for the target(s) of the model and variables were considered for the feature(s). Then, the data was preprocessed to remove unnecessary columns and determine which columns could benefit from "binning" by analyzing the unique values of the columns. Once the categorical variables were determined, they were encoded using one-hot encoding and placed in a new dataframe, which was then merged with the original dataframe. For Deliverable 2, the new dataframe was compiled, trained, and evaluated using machine learning and deep learning neural networks. Lastly, for Deliverable 3, the model was put through several optimization techniques to try and reach a higher level of accuracy. The results of the various techniques are discussed below.
Pewlett-Hackard-Analysis
The purpose of this analysis was to assist Pewlett Hackard in preparing for what is being dubbed the "silver tsunami", a large incoming wave of retiring employees. To do this, multiple tables containing employee and department title data are examined using SQL to determine the number of employees reaching retiring age and their titles. In addition, since these numbers are quite large, the same data is analyzed to identify employees who may be eligible to participate in a mentorship program to train incoming employees. This task is presented as two deliverbles: 1) The number of retiring employees by title and 2) A list of employees eligible for the Mentorship Program.
Weather_Data
The purpose of this project is to develop a PlanMyTrip app that will identify travel destinations and hotels based on user input of weather preferences. From the list of potential destinations, the tester can choose 4 cities to create a travel itenerary and using the Google Maps Directions API, a travel route is created between the four cities, as well as a marker layer map.
JeremyKRay's Repositories
JeremyKRay/Credit_Risk_Analysis
The purpose of this analysis was to compare several different machine learning models to determine which is best at predicting credit risk. The dataset used came from LendingClub, a peer-to-peer lending services company. Credit risk is an inherently unbalanced classification problem. Good loans greatly outnumber risky loans. This is why we are using 6 different machine learning techniques to train and evaluate models with unbalanced classes. The techniques fall under 4 different categories and are imported from the imbalanced-learn library. Under oversampling, we employ RandomOverSampler and SMOTE algorithms. Under undersampling we employ the ClusterCentroids algorithm. We use a combination of the over and undersampling techniques by employing the SMOTEENN algorithm. And finally, under Ensemble learning techniques we employ BalancedRandomForestClassifier and EasyEnsembleClassifier algorithms. These last two are fairly new and reduce bias. The scikit-learn library is used to test and train the models and to compare all of the techniques, producing a balanced acuracy score and precision and recall scores to determine if they are suitable techniques at predicting credit risk.
JeremyKRay/Bellybutton_Bacteria_Diversity
The purpose of this challenge was to explore the importance of data visualization to data analysists in communicating findings and conclusions. Although the topic is a bit strange, the outcome is a clear demonstration of the power these tools hold for data visualization. JavaScript is used to create attractive, accessible and interactive data visualizations. The JavaScript data visualization library Plotly.js is used to provide the interactivity needed to increase the user's comprehension of the data, drawing the same conclusions as the data analyst. To demonstrate this, a website was created that offers interactive analysis of a dataset containing information on the biodiversity of bacteria strains found in the human belly button. The hypothesis is that there might be a bacterial strain found in the belly button that can synthesize the proteins necessary to make improbable beef grown in a lab taste like real beef.
JeremyKRay/Bikesharing
The purpose of this analysis was to provide a group of shareholders an analysis of New York City ridesharing data so that they can provide a thorough proposal to investors for a bike sharing program in Des Moines, Iowa. Tableau was used to create several visualizations of a very large dataset of bike share data that includes trip location, trip duration, and rider type and gender. A story was created in Tableau that includes each of the visualizations.
JeremyKRay/Cryptocurrencies
JeremyKRay/Mapping_Earthquakes
JeremyKRay/Neural_Network_Charity_Analysis
Alphabet Soup's business team is looking to predict where to make investments. With our knowledge of machine learning and neural networks, the purpose of this project is to use the provided dataset and help create a binary classifier capable of predicting whether applicants will be successful if funded by Alphabet Soup. The dataset contains 34,000 organizations that have received Alphabet Soup funding. First, for Deliverable 1, a dataframe was created and variables were considered for the target(s) of the model and variables were considered for the feature(s). Then, the data was preprocessed to remove unnecessary columns and determine which columns could benefit from "binning" by analyzing the unique values of the columns. Once the categorical variables were determined, they were encoded using one-hot encoding and placed in a new dataframe, which was then merged with the original dataframe. For Deliverable 2, the new dataframe was compiled, trained, and evaluated using machine learning and deep learning neural networks. Lastly, for Deliverable 3, the model was put through several optimization techniques to try and reach a higher level of accuracy. The results of the various techniques are discussed below.
JeremyKRay/Weather_Data
The purpose of this project is to develop a PlanMyTrip app that will identify travel destinations and hotels based on user input of weather preferences. From the list of potential destinations, the tester can choose 4 cities to create a travel itenerary and using the Google Maps Directions API, a travel route is created between the four cities, as well as a marker layer map.
JeremyKRay/Amazon_Vine_Analysis
JeremyKRay/MechaCar_Statistical_Analysis
The purpose of this project was to analyze a fictitious auto manufacturing company's production problems. There are troubles that are blocking the manufacturing team's progress. R is used to perform multiple linear regression analysis, collect summary statistics, run t-tests and design a statistical study to compare vehicle performance.
JeremyKRay/Pewlett-Hackard-Analysis
The purpose of this analysis was to assist Pewlett Hackard in preparing for what is being dubbed the "silver tsunami", a large incoming wave of retiring employees. To do this, multiple tables containing employee and department title data are examined using SQL to determine the number of employees reaching retiring age and their titles. In addition, since these numbers are quite large, the same data is analyzed to identify employees who may be eligible to participate in a mentorship program to train incoming employees. This task is presented as two deliverbles: 1) The number of retiring employees by title and 2) A list of employees eligible for the Mentorship Program.
JeremyKRay/Election_Analysis
Using python to analyze election results
JeremyKRay/kickstarter-analysis
Performing analysis on Kickstarter data to uncover trends
JeremyKRay/Mission-to-Mars
JeremyKRay/Movies-ETL
JeremyKRay/PyBer_Analysis
JeremyKRay/School_District_Analysis
JeremyKRay/stock-analysis
An analysis of green energy stocks for Steve to better advise his parents
JeremyKRay/surfs_up
Surfing Project using SQL lite
JeremyKRay/UFOs