eaziReda
Almost every data analysis project involves the process of doing some exploratory data analysis(EDA) and data preprocessing. Usually they serve as a very crucial and inevitable step in a data analysis workflow. There are some very common tasks in EDA, which can include:
- checking for missing values
- detecting outliers
- plotting correlation plots between features
- plotting histograms/bar plots for each individual features
Typically these steps are followed by some preprocessing like imputation and dealing with outliers. All of those steps together may require lots of coding effort and can be repeated for several projects. To solve this issue, we designed the R package eaziReda that wraps all of those lines of code into four convenient functions that will allow you to quickly and easily carry out EDA along with some simple preprocessing using just a few lines of code!
Installation
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("https://github.com/UBC-MDS/eaziReda.git")
Documentation & Usage
Documentation and usage examples for eaziReda
can be found
here.
Features
missing_impute
: This function will take in a dataframe and generate a table listing the number of missing values and the percentage of missing values for each column. It also gives the user an option of doing some simple imputations on the entire dataframe in place. The imputation methods can also be customized by the user.outliers_detect
: This function will take in a vector and will return a boolean vector with outliers marked given by certain method that the users can customize. It also gives the user an option to remove all the outliers in place.corr_plot
: This function will take in a dataframe and a list of feature names to generate a correlation plot for the given list of features.histograms
: This function will take in a dataframe and a list of feature names to generates histograms for numeric features and bar plots for categorical featuresremove_outliers
: This function will remove the outliers from the given vector based on a second vector that has the outliers’ indices marked
Similar Work
While there aren’t a ton of packages in R that do only EDA, quite a few of them include it as a secondary functionality. Here are a few packages that we found that do something similar:
- SmartEDA
- SmartEDA includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data.
- dlookr
- A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values and outliers and unique and negative values to help you understand the distribution and quality of your data.
- DataExplorer
- Automated data exploration process for analytic tasks and predictive modeling, so that users could focus on understanding data and extracting insights.
Code of Conduct
Please note that the eaziReda project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Contributors
We welcome and recognize all contributions.
Core contributor | Github.com username |
---|---|
Vignesh Lakshmi Rajakumar | @vigneshrajakumar |
Dustin Andrews | @dbandrews |
Arash Shamseddini | @arashshams |
Yuyan Guo | @yuyanguo |