RashmiAyas95
I am a data-oriented person with a keen interest in Machine and deep learning algorithms. I have educational credentials & expertise in Python, R, and Tableau.
Boston
Pinned Repositories
UNICEF_SOWC_Data_Visualization
The data set describes the State of the World’s Children 2019. The data set contains 16 sheets It covers the information like Demographics, Child Mortality, Maternal & Newborn Health, Child Health, HIV AIDS Epidemiology & Intervention, Nutrition A & B, Early Childhood Development, Education, Child Protection, Society Protection & Equity, Wash, Adolescents. Along with this parameters, they even contain Economic Indicators and Women Empowerment. Some of the attributes used are : Demographics Child Mortality Child Health Nutrition A Education.
Customer-Order-Sales-Analysis
Diabetes_Prediction_Analysis
Human_Resources_Data_Visualization
The HR data can sometimes be hard to come by and HR professionals lag behind with respect to the data analysis and data visualizations. Thus, this data has been created from scratch revolving around a fictitious company called Dental Magic to incorporate various competency related analysis and to understand performance related queries with salary distributions. The data set has been obtained from the Kaggle website for the purpose of this assignment. Some initial analysis has been conducted using Excel for understanding how different columns in the data set can answer business related problems.
King_County_House_Price_Prediction_Analysis
King County is a county located in the U.S. state of Washington. The population was 2,149,970 in a 2016 census estimate. King is the most populous county in Washington, and the 13th-most populous in the United States. The county seat is Seattle, which is the state’s largest city. King County is one of three Washington counties that are included in the Seattle-Tacoma-Bellevue metropolitan statistical area. About two-thirds of King County’s population lives in the city’s suburbs. As of 2011, King County was the 86th highest-income county in the United States. This document addresses the factors concerning the “house sale prices” in King County sold between May 2014 and May 2015. For this project, I am using a dataset from Kaggle, ‘kc_house_data.csv’ (https://www.kaggle.com/harlfoxem/housesalesprediction ). This dataset has a good mix of categorical independent variables, and a continuous dependent variable (price). This dataset contains house sale prices for King County. It is a useful dataset for evaluating simple regression models. In this dataset, I will predict the sales price of houses in King County. It includes homes sold between May 2014 and May 2015. I performed Data cleaning, data modelling, variable selection method (step: forward and backward), Checking for skewness, Correlation to check which variables have positive and negative impact on Price prediction, k-fold cross validation on Gradient boosting. Different models are used overtime to check Accuracy: 1.Linear regression 2.Decision Tree 3.Random Forest 4.Gradient Boosting
RashmiAyas95's Repositories
RashmiAyas95/Customer-Order-Sales-Analysis
RashmiAyas95/Diabetes_Prediction_Analysis
RashmiAyas95/UNICEF_SOWC_Data_Visualization
The data set describes the State of the World’s Children 2019. The data set contains 16 sheets It covers the information like Demographics, Child Mortality, Maternal & Newborn Health, Child Health, HIV AIDS Epidemiology & Intervention, Nutrition A & B, Early Childhood Development, Education, Child Protection, Society Protection & Equity, Wash, Adolescents. Along with this parameters, they even contain Economic Indicators and Women Empowerment. Some of the attributes used are : Demographics Child Mortality Child Health Nutrition A Education.
RashmiAyas95/Human_Resources_Data_Visualization
The HR data can sometimes be hard to come by and HR professionals lag behind with respect to the data analysis and data visualizations. Thus, this data has been created from scratch revolving around a fictitious company called Dental Magic to incorporate various competency related analysis and to understand performance related queries with salary distributions. The data set has been obtained from the Kaggle website for the purpose of this assignment. Some initial analysis has been conducted using Excel for understanding how different columns in the data set can answer business related problems.
RashmiAyas95/King_County_House_Price_Prediction_Analysis
King County is a county located in the U.S. state of Washington. The population was 2,149,970 in a 2016 census estimate. King is the most populous county in Washington, and the 13th-most populous in the United States. The county seat is Seattle, which is the state’s largest city. King County is one of three Washington counties that are included in the Seattle-Tacoma-Bellevue metropolitan statistical area. About two-thirds of King County’s population lives in the city’s suburbs. As of 2011, King County was the 86th highest-income county in the United States. This document addresses the factors concerning the “house sale prices” in King County sold between May 2014 and May 2015. For this project, I am using a dataset from Kaggle, ‘kc_house_data.csv’ (https://www.kaggle.com/harlfoxem/housesalesprediction ). This dataset has a good mix of categorical independent variables, and a continuous dependent variable (price). This dataset contains house sale prices for King County. It is a useful dataset for evaluating simple regression models. In this dataset, I will predict the sales price of houses in King County. It includes homes sold between May 2014 and May 2015. I performed Data cleaning, data modelling, variable selection method (step: forward and backward), Checking for skewness, Correlation to check which variables have positive and negative impact on Price prediction, k-fold cross validation on Gradient boosting. Different models are used overtime to check Accuracy: 1.Linear regression 2.Decision Tree 3.Random Forest 4.Gradient Boosting