- The data of baseball players here
- The project has following subsection:
- Data quality analysis
- Data Cleaning
- Exploratory Data Analysis
- Building model for players salary (linear regression is used)
- Predicting if a player will hit/not hit (Logistic regression is used)
- References
- R
- ggplot2 for data visualization
- dplyr for data manipulation
- stringr for string manipulation
- nanier to see the null value in a nice graph
- validate for data quality checking
- gridExtra for plotting in grid
- tidyr for plotting multiple histogram in single plot
- purr
- heteroskadacity
- multicollinarity
Senaviratna, N.A.M.R. and Cooray, T.M.J.A., 2019. Diagnosing Multicollinearity of Logistic Regression Model. Asian Journal of Probability and Statistics, pp.1-9.