Pinned Repositories
AdClick_Fraud
Capstone project #2 for the Harvard University Professional Certificate in Data Science
AWSSageMaker_PythonXGBoostTutorial
Python XGBoost model, using Amazon SageMaker, EC2 instances and S3 buckets. Used to prepare, partition, train, tune, predict and evaluate model. Project involves predicting customers who sign up for a financial product at a bank.
Bikeshare-Exploratory-Analysis
An exploratory analysis of the Kaggle bikeshare data set with the application of linear regression models, which are not optimal for this particular problem of predicting bikes rented.
Boston-Housing---Random-Forest-XGBoost
Leveraging regression random forest and XGBoost algorithms with cross validation and grid search to tune the best performing model on the Boston Housing dataset. Analyzed and visualized the most statistically significant features for both models. Achieved an RMSE of $2K
Customer-Churn-w-Logistic-Regression
Utilizing tools such as Spark, Python (PySpark), SQL, and Databricks, performed logistic regression on customers to predict those at a higher risk of churning, then applied the model to an unseen "new customers" data set.
Disney-Movies-Box-Office-Hits
Analysis of Disney's top grossing films (adjusted for inflation) in Python, using regression to attribute film genre to success. The project includes using regression on the data, as well as bootstrap regression to determine confidence intervals of the intercept and coefficients.
International-Debt-Stats-EDA
Used SQL in Jupyter Notebooks to analyze and explore data on international debts and codes.
SMS-Spam-Prediction
Predicting whether an SMS (text message) is spam using natural language processing (NLP), naive Bayes classifier and cross validation (in Python)
TheMatrixScript_NLP
A project utilizing NLP techniques and analysis including text mining, document term matrices, sentiment analysis, wordclouds and topic modeling with LDA.
TweetClassificationLSTM
This project details the creation of a multi-classification Recurent Neural Network (RNN) model using Tensorflow / Keras to predict Tweet emotions. More specifically, this notebook uses a bidirectional LSTM as a means to capture additional semantics often found in sequential (language) data. This project utilizes the Tweet Emotion Recognition with TensorFlow dataset provided by Kaggle.
LeondraJames's Repositories
LeondraJames/Bikeshare-Exploratory-Analysis
An exploratory analysis of the Kaggle bikeshare data set with the application of linear regression models, which are not optimal for this particular problem of predicting bikes rented.
LeondraJames/BostonHousingPrices_NeuralNet
My first attempt at implementing a neural network using the Boston housing data set from the MASS library.
LeondraJames/CandyCrushProj
Candy Crush Level Difficulty Analysis
LeondraJames/ChipotleLocations
This is a descriptive and exploratory data analysis project from DataCamp which aims to explore real data on every Chipotle location to identify franchising opportunities. The goal is to scout out the next Chipotle location using interactive maps (ie: leaflet) and external data to compare proposed locations on several important factors, such as proximity to current Chipotle locations, the distribution of the state's population, and the distance from interstates and tourist attractions.
LeondraJames/DataKind-Project-7.21
Data Visualizations
LeondraJames/DS-Bootcamp-Capstone-Mondayball
Data Science & Machine Learning Data Capstone based on Moneyball dataset
LeondraJames/First-KNN-Attempt---ISLR-Caravan-Dataset
This is my first attempt at a KNN model, where I attempt to classify the purchase of caravan insurance in the Caravan data set (ISLR package).
LeondraJames/HeartDisease
"What Your Heart Is Telling You" Logit Model
LeondraJames/LoanPaymentPrediction_SVM
My first attempt with building a SVM model, and optimizing the cost and gamma parameters using the Gaussian Kernel grid search method.
LeondraJames/TheEconomistDataViz
Re-Imagination of The Economist: Corruption v. Development
LeondraJames/Titanic_Attempt-1
Kaggle Titanic Data Set Using Logit Model