Suggestion

Question

Suggestion

Closed this issue 5 years ago · 2 comments

It would also be great to have outlier removal/imputation based on the columns

6 σ

trainData[, `:=`(mean_dv = mean(dv), sd_dv = sd(dv))]
trainData <- trainData[dv >= (mean_dv - (6*sd_dv)) & (dv <= mean_dv + (6*sd_dv))]
trainData[, c('mean_dv', 'sd_dv'):=NULL]

percentile

removeOneOutliersFunc <- function(trainData, colName, outlierVec = c(0.0001,0.9999)){
  vec       <- trainData[[colName]]
  values    <- as.numeric(quantile(vec, outlierVec, na.rm = TRUE))
  trainData <- trainData[vec >= values[1] & vec <= values[2]]
  return(trainData)
}

Answer 1 · 2018-01-17T08:26:12.000Z

Great idea.

I was thinking on building a bunch of statistical functions to complete this package.
Those would absolutly go into this category.

I guesss I will create a project for this part.

Any ideas of statistical functions to filter/preprocess data are welcomed.

Answer 2 · 2019-07-19T12:52:04.000Z

Hi,

Functions remove_sd_outlier, remove_percentile_outlier, remove_rare_categorical have been implemented and will be available in next cran release.

Feel free to test them, check if they are helpfull and suggest any form of improvements.

I close,

Thanks.

Emmanuel-Lin