Pinned Repositories
1995-2015-Marriage-Education-Trends-Visualization
A Tableau data story that consists of multiple visualizations analyzing the relationship between marriage and education trends in the United States between 1995 and 2015, determining how each trend was impacted by economic recessions over the years, and investigating the relationship's overall impact on each state's median household income.
assignment3
Big Data Analysis Class Project
Cyclone-Database
A MySQL relational cyclone database including all NOAA recorded cyclones originating from the Pacific and Atlantic Oceans in 2014 designed and built complete with views and queries for predictive analysis purposes.
Genre-Song-Duration-ANOVA-Test
An R project that investigates whether different genres of songs have significantly different durations through the use of a one-way ANOVA test and post hoc significance tests conducted over an excerpt of a dataset consisting of 1 million popular songs compiled by The Echo Nest and a lab at Columbia University.
Immigration-Trends-Visualization
A data story consisting of Excel visualizations analyzing economic, social, educational, and crime trends against increasing foreign-born and immigrant populations in the US to provide an insight into their impact on these factors.
Indeed-Internship-Data-Aggregator
A Python-based Indeed internship data aggregator created to collect data on Washington D.C. data scientist and analyst internship listings on Indeed and save them into a dataset through the use of multiple Python libraries such as Re, Pandas, Time, Requests, and BeautifulSoup.
Predicting-The-Next-Hit-Song
A Big Data Python project which develops a random forest classification model that determines and predicts a song’s popularity based on social media sentiment, streaming data, past Billboard charting data, and lyric sentiment analysis and topic modeling. Once developed, it posts the results on Twitter through an automated Twitter Bot.
Reducing-Staff-Attrition-ML
An R project that analyzes a company's employee data through a logistic regression and decision tree machine learning model in order to investigate the primary factors, along with their patterns, that impact and lead to high staff attrition rates and help develop actionable insights and comprehensive strategies to significantly reduce attrition.
Social-Network-Visualization
An analysis of my Facebook social network using a Gephi network graph visualization to determine the number and types of communities I am a part of online.
Terrorist-Attack-Network-Visualization
An analysis of 9 terrorist attack networks across various countries between the years 2000 and 2005 using a Gephi network graph visualization with the goal of discovering patterns, similarities, connections, and potential relationships between one another’s actors.
pkhiyara's Repositories
pkhiyara/Terrorist-Attack-Network-Visualization
An analysis of 9 terrorist attack networks across various countries between the years 2000 and 2005 using a Gephi network graph visualization with the goal of discovering patterns, similarities, connections, and potential relationships between one another’s actors.
pkhiyara/Genre-Song-Duration-ANOVA-Test
An R project that investigates whether different genres of songs have significantly different durations through the use of a one-way ANOVA test and post hoc significance tests conducted over an excerpt of a dataset consisting of 1 million popular songs compiled by The Echo Nest and a lab at Columbia University.
pkhiyara/Cyclone-Database
A MySQL relational cyclone database including all NOAA recorded cyclones originating from the Pacific and Atlantic Oceans in 2014 designed and built complete with views and queries for predictive analysis purposes.
pkhiyara/Immigration-Trends-Visualization
A data story consisting of Excel visualizations analyzing economic, social, educational, and crime trends against increasing foreign-born and immigrant populations in the US to provide an insight into their impact on these factors.
pkhiyara/Indeed-Internship-Data-Aggregator
A Python-based Indeed internship data aggregator created to collect data on Washington D.C. data scientist and analyst internship listings on Indeed and save them into a dataset through the use of multiple Python libraries such as Re, Pandas, Time, Requests, and BeautifulSoup.
pkhiyara/Social-Network-Visualization
An analysis of my Facebook social network using a Gephi network graph visualization to determine the number and types of communities I am a part of online.
pkhiyara/1995-2015-Marriage-Education-Trends-Visualization
A Tableau data story that consists of multiple visualizations analyzing the relationship between marriage and education trends in the United States between 1995 and 2015, determining how each trend was impacted by economic recessions over the years, and investigating the relationship's overall impact on each state's median household income.
pkhiyara/assignment3
Big Data Analysis Class Project
pkhiyara/Predicting-The-Next-Hit-Song
A Big Data Python project which develops a random forest classification model that determines and predicts a song’s popularity based on social media sentiment, streaming data, past Billboard charting data, and lyric sentiment analysis and topic modeling. Once developed, it posts the results on Twitter through an automated Twitter Bot.
pkhiyara/Reducing-Staff-Attrition-ML
An R project that analyzes a company's employee data through a logistic regression and decision tree machine learning model in order to investigate the primary factors, along with their patterns, that impact and lead to high staff attrition rates and help develop actionable insights and comprehensive strategies to significantly reduce attrition.
pkhiyara/Baltimore-Crime-Trends-Visualization
A Tableau data story consisting of a series of visualizations analyzing Baltimore City crime trends with the goal of providing valuable and actionable insights to Baltimore PD to promote safer neighborhoods.
pkhiyara/DC-311-Service-Requests-Household-Income-Relationship
A Python data manipulation and analysis project that examines the relationship between the number of 311 service request calls placed and the average household income of Washington D.C. residents, based on the eight wards that constitute the city, to find a potential correlation through the use of Pandas dataframes and Matplotlib visualizations.
pkhiyara/Dunbar-Number-t-test
An R project that conducts a hypothesis test to analyze the validity of Robin Dunbar's claim that humans only have the capacity to keep track of a maximum of 150 people (Dunbar's Number) at a time. Through this project, Dunbar's hypothesis is investigated with the use of multiple t-tests conducted over a Facebook usage dataset.
pkhiyara/Film-Revenue-Prediction
The Capstone (B.Tech Major) Project
pkhiyara/flights-cancellation
This repository contains assets for the Flights Cancellation Forecast use case.
pkhiyara/Nightclub-Dataset-Analysis
A Python-based nightclub data aggregator that scrapes information about nightclubs from various websites for multidimensional analysis.
pkhiyara/Pew-Sex-Edu-Income-Factorial-ANOVA-Test
An R project that investigates and visualizes the effect of sex and education on an individual's income level through the use of a full-factorial two-way ANOVA test and relevant post hoc significance tests conducted over a 2014 Pew Research Center dataset consisting of higher education attainment, gender, and income data.
pkhiyara/R-vs-Pandas-Stack-Exchange-API
A Python data manipulation and analysis project that examines and visualizes the popularity of widely used data science tools R and Pandas across 3 Stack Exchange subcommunities (Stack Overflow, Cross Validated, Data Science) through the use of the Stack Exchange API and multiple Python libraries such as Pandas, JSON, Requests, and Matplotlib.