/DTrsiv

A collection of functions using data.table to efficiently clean large tables using a simplified syntax

Primary LanguageRGNU General Public License v3.0GPL-3.0

DTrsiv

GitHub repo size GitHub issues GitHub closed issues

DTrsiv is a R package containing a collection of R data.table functions available to quickly and easily clean your data.
Everyone who wants is welcome to contribute!

Author: PAGEAUD Y.1
Contributors: Everyone who wants is welcome to contribute!
1- DKFZ - Division of Applied Bioinformatics, Germany.

GitHub R package version

GitHub last commit
GitHub

Prerequisites

Install devtools and data.table packages:

install.packages(pkgs = c("devtools", "data.table"))

Install

devtools::install_github("YoannPa/DTrsiv")

Content

dt_fun.R script contains functions related to R data.table formating:

  • dt.sub() for pattern matching and substitution applied on data.table object column-wise. It first identifies the columns containing any occurence matching the pattern and then applies the substitution considering only columns where the pattern matched, thus shortening execution time on data.table with many columns. It supports columns of type list.
  • dt.ls2c() converts data.table columns of type list to a type vector.
  • dt.rm.dup() removes duplicated columns based on their content (not on their names).
  • dt.rm.allNA() removes columns exclusively containing NAs from a data.table.
  • dt.int64tochar() converts columns of 'double.integer64' type into 'character' type.
  • dt.combine() combines values of partially duplicated columns from a data.table into new columns.

dt_chk.R script contains functions related to checking a R data.table content:

  • allNA.col() checks if any column contains exclusively NAs and returns their names if any with a warning.
  • best.merged.dt() looks for the best merging operation(s) between two data.tables trying a set of columns from the second one.

Problems ? / I need help !

For any questions Not related to bugs or development you can write me at y.pageaud@dkfz.de.

Technical questions / Development / Feature request

If you encounters issues or a feature you would expect is not part of DTrsiv functions available, please go to the DTrsiv Github repository click on the tab Issues and create an issue.

References

  1. Introduction to data.table: https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
  2. Official R data.table Github repository: https://github.com/Rdatatable/data.table
  3. By-Group Processing, the R data.table and the Power of Open Source (22.02.2011) - Steve Miller