toshi-k/kaggle-bosch-production-line-performance

error in get_features_duplicated.r

Closed this issue · 2 comments

dyln commented

hi here,

> train_categorical <- fread(file.path(data_dir, "train_categorical.csv"), data.table=FALSE,
+ 							na.strings="", showProgress=TRUE,colClasses=colClasses, drop="Id")
Read 1183747 rows and 2140 (of 2141) columns from 2.494 GB file in 00:05:39
Error in .subset2(x, j) : subscript out of bounds

i'm getting this error here, how can avoid that?

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0       dtplyr_0.0.2      plyr_1.8.4        digest_0.6.12     magrittr_1.5     
[6] data.table_1.10.4

loaded via a namespace (and not attached):
[1] compiler_3.4.0   R6_2.2.1         assertthat_0.2.0 DBI_0.6-1        tools_3.4.0     
[6] tibble_1.3.1     Rcpp_0.12.10     rlang_0.1.1

thankkks: )

Thank you for your report. I locate cause of trouble.
With newer data.table::fread we cannot use colClasses option and drop option simultaneously.

Please modify code like this.
b663b73#diff-1723f610f4354d461b349411aff360e2
Don't use drop option, please remove columns after reading data.

dyln commented

thanks so much. i'm going to use this code in my term project. i can send it to you if you want to check it out.

cheers :)