normalize fails

Question

normalize fails

cells2numbers opened this issue 5 years ago · 1 comments

Normalization fails in a simple case where cytominer is used to normalize a complete data set (no groups). Attached is a csv containing the features used in the provided example code below.

Tested with dplyr >0.8 so this issue could be related to #131 (not checked yet)

Example file population_ge_test.csv.tar.gz

Example:

library(readr)
library(dplyr)
library(magrittr)

df <- read_csv('population_ge_test.csv') %>% 
  filter(complete.cases(.))

# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)

df %<>% mutate(strata_col = 1) 

feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple")  %>% print
 
ge_normalized <- cytominer::normalize(
  population = df,
  variables = feature_columns, 
  sample = df,
  strata = c("strata_col"), 
  operation = "standardize"
)

Error:

Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) : non-numeric argument to binary operator

sessionInfo:

> sessionInfo() 
R version 3.6.1 (2019-07-05) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Ubuntu 18.04.3 LTS  

Matrix products: default 
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so  

locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         

attached base packages: 
[1] stats     graphics  grDevices utils     datasets  methods   base       other 

attached packages: 
[1] magrittr_1.5 dplyr_0.8.3  readr_1.3.1   

loaded via a namespace (and not attached):  
[1] Rcpp_1.0.2           rstudioapi_0.10      knitr_1.24           hms_0.5.0            tidyselect_0.2.5     lattice_0.20-38      R6_2.4.0             rlang_0.4.0           
[9] foreach_1.4.7        tools_3.6.1          grid_3.6.1           xfun_0.8             lambda.r_1.2.3       futile.logger_1.4.3  iterators_1.0.12     assertthat_0.2.1     
[17] tibble_2.1.3         crayon_1.3.4         Matrix_1.2-17        formatR_1.7          purrr_0.3.2          futile.options_1.0.1 codetools_0.2-16     vctrs_0.2.0          
[25] zeallot_0.1.0        glue_1.3.1           compiler_3.6.1       cytominer_0.1.0.9000 pillar_1.4.2         backports_1.1.4      pkgconfig_2.0.2
--
 ```

Answer 1 · 2020-03-20T14:03:56.000Z

Fixed in #135

library(readr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)

df <- read_csv('~/Downloads/population_ge_test.csv') %>% 
  filter(complete.cases(.))
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   Metadata_broad_sample_simple = col_character()
#> )
#> See spec(...) for full column specifications.

# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)

df %<>% mutate(strata_col = 1) 

feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple")

ge_normalized <- cytominer::normalize(
  population = df,
  variables = feature_columns, 
  sample = df,
  strata = c("strata_col"), 
  operation = "standardize"
)

ge_normalized %>% select(1:3) %>% slice(1:5) %>% knitr::kable()

Metadata_broad_sample_simple	1	2
BRD-A01528713	-1.2871379	1.0886862
BRD-A02809788	-0.0555724	-0.7771730
BRD-A03182941	0.5495872	-0.9621648
BRD-A04691170	-0.4334830	-0.3546694
BRD-A08759443	-0.1129926	-0.5845238

^{Created on 2020-03-20 by the reprex package (v0.3.0)}