normalize fails
cells2numbers opened this issue · 1 comments
cells2numbers commented
Normalization fails in a simple case where cytominer is used to normalize a complete data set (no groups). Attached is a csv containing the features used in the provided example code below.
Tested with dplyr >0.8 so this issue could be related to #131 (not checked yet)
Example file population_ge_test.csv.tar.gz
Example:
library(readr)
library(dplyr)
library(magrittr)
df <- read_csv('population_ge_test.csv') %>%
filter(complete.cases(.))
# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)
df %<>% mutate(strata_col = 1)
feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple") %>% print
ge_normalized <- cytominer::normalize(
population = df,
variables = feature_columns,
sample = df,
strata = c("strata_col"),
operation = "standardize"
)
Error:
Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) : non-numeric argument to binary operator
sessionInfo:
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base other
attached packages:
[1] magrittr_1.5 dplyr_0.8.3 readr_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 rstudioapi_0.10 knitr_1.24 hms_0.5.0 tidyselect_0.2.5 lattice_0.20-38 R6_2.4.0 rlang_0.4.0
[9] foreach_1.4.7 tools_3.6.1 grid_3.6.1 xfun_0.8 lambda.r_1.2.3 futile.logger_1.4.3 iterators_1.0.12 assertthat_0.2.1
[17] tibble_2.1.3 crayon_1.3.4 Matrix_1.2-17 formatR_1.7 purrr_0.3.2 futile.options_1.0.1 codetools_0.2-16 vctrs_0.2.0
[25] zeallot_0.1.0 glue_1.3.1 compiler_3.6.1 cytominer_0.1.0.9000 pillar_1.4.2 backports_1.1.4 pkgconfig_2.0.2
--
```
shntnu commented
Fixed in #135
library(readr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
df <- read_csv('~/Downloads/population_ge_test.csv') %>%
filter(complete.cases(.))
#> Parsed with column specification:
#> cols(
#> .default = col_double(),
#> Metadata_broad_sample_simple = col_character()
#> )
#> See spec(...) for full column specifications.
# workaround for error: geError in parse(text = x) : <text>:1:7: unexpected input 1: 221227_ ^
colnames(df) <- c("Metadata_broad_sample_simple",1:977)
df %<>% mutate(strata_col = 1)
feature_columns <- setdiff(colnames(df),"Metadata_broad_sample_simple")
ge_normalized <- cytominer::normalize(
population = df,
variables = feature_columns,
sample = df,
strata = c("strata_col"),
operation = "standardize"
)
ge_normalized %>% select(1:3) %>% slice(1:5) %>% knitr::kable()
Metadata_broad_sample_simple | 1 | 2 |
---|---|---|
BRD-A01528713 | -1.2871379 | 1.0886862 |
BRD-A02809788 | -0.0555724 | -0.7771730 |
BRD-A03182941 | 0.5495872 | -0.9621648 |
BRD-A04691170 | -0.4334830 | -0.3546694 |
BRD-A08759443 | -0.1129926 | -0.5845238 |
Created on 2020-03-20 by the reprex package (v0.3.0)