- The data of baseball players here
- The project has following subsection:
-
- Data quality analysis
-
- Data Cleaning
-
- Exploratory Data Analysis
-
- Building model for players salary (linear regression is used)
-
- Predicting if a player will hit/not hit (Logistic regression is used)
-
- References
- R
- ggplot2 for data visualization
- dplyr for data manipulation
- stringr for string manipulation
- nanier to see the null value in a nice graph
- validate for data quality checking
- gridExtra for plotting in grid
- tidyr for plotting multiple histogram in single plot
- purr
- heteroskadacity
- multicollinarity
-
https://cooldata.wordpress.com/2010/03/04/why-transform-the-dependent-variable/
-
https://statisticsbyjim.com/regression/heteroscedasticity-regression/
-
https://stackoverflow.com/questions/40572124/plot-lm-error-operator-is-invalid-for-atomic-vectors
-
Senaviratna, N.A.M.R. and Cooray, T.M.J.A., 2019. Diagnosing Multicollinearity of Logistic Regression Model. Asian Journal of Probability and Statistics, pp.1-9.