error in get_features_duplicated.r
Closed this issue · 2 comments
dyln commented
hi here,
> train_categorical <- fread(file.path(data_dir, "train_categorical.csv"), data.table=FALSE,
+ na.strings="", showProgress=TRUE,colClasses=colClasses, drop="Id")
Read 1183747 rows and 2140 (of 2141) columns from 2.494 GB file in 00:05:39
Error in .subset2(x, j) : subscript out of bounds
i'm getting this error here, how can avoid that?
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.5.0 dtplyr_0.0.2 plyr_1.8.4 digest_0.6.12 magrittr_1.5
[6] data.table_1.10.4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 R6_2.2.1 assertthat_0.2.0 DBI_0.6-1 tools_3.4.0
[6] tibble_1.3.1 Rcpp_0.12.10 rlang_0.1.1
thankkks: )
toshi-k commented
Thank you for your report. I locate cause of trouble.
With newer data.table::fread we cannot use colClasses option and drop option simultaneously.
Please modify code like this.
b663b73#diff-1723f610f4354d461b349411aff360e2
Don't use drop option, please remove columns after reading data.
dyln commented
thanks so much. i'm going to use this code in my term project. i can send it to you if you want to check it out.
cheers :)