Prepare: An R Package for Streamlined Data Cleaning
The Prepare
package is specifically designed to address the unique challenges of cleaning and preparing data from various public datasets. It transforms what previously required dozens of lines of code into a single command. With an initial focus on data from the World Bank and the United Nations, this package simplifies the process of transforming raw data into a format that's ready for analysis. This includes capabilities such as reshaping datasets, handling missing values, standardizing naming conventions, and more.
- 🌐Specialized Functions: Includes functions like
WB_Clean
andUN_Clean
tailored for datasets from the World Bank and United Nations, respectively. - 🔄Flexibility in Data Transformation: Offers options to reshape data into wide or long formats, catering to diverse analytical needs.
- 👍Ease of Use: Designed with simplicity in mind, it's suitable for both beginners and experienced R users.
- 📈Expandability: While currently focused on World Bank and United Nations data, the package is structured to easily incorporate cleaning functions for additional data sources in the future.
🌍Ideal for researchers, data analysts, and anyone working with data from international organizations, Prepare
serves as a go-to solution for making the initial steps of data analysis quicker and more efficient.
You can install the released version of Prepare through Github with:
# install.packages("devtools")
devtools::install_github("Chasen-Jeffries/Prepare")
Here's a quick example of how to use the Prepare package:
# Load Package
library(Prepare)
# Load a sample dataset (replace this with an actual dataset)
data(world_bank_example)
# Clean the World Bank data
cleaned_data <- Prepare(world_bank_example, source = 'wb')
The Prepare
function is essential for cleaning and preparing datasets. It requires two main arguments:
- Required Argument:
df
: The DataFrame to be cleaned.source
: The source of the data. Currently, it supports 'wb' (World Bank) and 'un' (United Nations).
- make_wide: A logical argument. If set to
TRUE
, it transforms the dataset from long format to wide format. - drop_na: A logical argument. When
TRUE
, it drops all rows with NA values. - var_name: An optional argument to specify a new name for the value column in a long format dataset.
This function is tailored for cleaning datasets obtained from the World Bank.
- Required Argument:
dataset
: The World Bank dataset to be cleaned.
- make_wide: A logical argument. If set to
TRUE
, it transforms the dataset from long format to wide format. - drop_na: A logical argument. When
TRUE
, it drops all rows with NA values. - var_name: An optional argument to specify a new name for the value column in a long format dataset.
This function is specifically designed for cleaning datasets from the United Nations.
- Required Argument:
dataset
: The United Nations dataset to be cleaned.
- make_wide: A logical argument. If set to
TRUE
, it transforms the dataset from long format to wide format. - drop_na: A logical argument. When
TRUE
, it drops all rows with NA values. - var_name: An optional argument to specify a new name for the value column in a long format dataset.
This function standardizes country names within a dataframe to their ISO 3-letter codes, improving consistency and comparability across datasets.
Required Argument:
df
: The dataframe containing the country names or codes to be standardized.column_name
: The name of the column that contains the country names or codes.
# Standardize country names in a dataset
standardized_df <- std_country(dataset, "country_column")
- 🐞 Bugs: File a reproducible example on GitHub issues.
- 💬 Discussions: Contact the package maintainer.
Contributions to the Prepare package are welcome from anyone and are best sent as a pull request on GitHub.
The Prepare package is licensed under the MIT License.