mcanouil/NACHO

example in vignette error

Closed this issue · 4 comments

if I follow the example in the vignette I encounter this error:

Add IDs

targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)

Attaching package: 'NACHO'

The following object is masked from 'package:BiocGenerics':

normalize

GSE70970_sum <- summarise(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  • ssheet_csv = targets, # The samplesheet
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  • normalisation_method = "GEO", # Geometric mean or GLM
  • n_comp = 5 # Number indicating the number of principal components to compute.
  • )
    [NACHO] Importing RCC files.
    Error: Column cols must be length 1 (the number of rows), not 3

Hi,

I can't replicate your error.
And the vignette successfully compiled as you can see on the website

Below is a full reproducible example of the code you mentionned, as you can see I don't have your error. Please check the session information in the end.

library(GEOquery)
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:parallel':
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter,
#>     Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, Position, rank, rbind, Reduce, rownames, sapply,
#>     setdiff, sort, table, tapply, union, unique, unsplit, which,
#>     which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Setting options('download.file.method.GEOquery'='auto')
#> Setting options('GEOquery.inmemory.gpl'=FALSE)
# Download data
gse <- getGEO("GSE70970")
#> Found 1 file(s)
#> GSE70970_series_matrix.txt.gz
#> Parsed with column specification:
#> cols(
#>   .default = col_double(),
#>   ID_REF = col_character()
#> )
#> See spec(...) for full column specifications.
#> File stored at:
#> /tmp/RtmpKA2y6S/GPL20699.soft
# Get phenotypes
targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
#>                                                                    size
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       1986560
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz     672
#>                                                                 isdir mode
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       FALSE  644
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz FALSE  644
#>                                                                               mtime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:23
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:24
#>                                                                               ctime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:23
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:24
#>                                                                               atime
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       2019-11-15 11:25:21
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:25:23
#>                                                                  uid gid
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       1738  50
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz 1738  50
#>                                                                    uname
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                       mcanouil
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz mcanouil
#>                                                                 grname
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_RAW.tar                        staff
#> /tmp/RtmpKA2y6S/GSE70970/GSE70970_characteristics_readme.txt.gz  staff
# Unzip data
untar(
  tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"), 
  exdir = paste0(tempdir(), "/GSE70970/Data")
)
# Add IDs
targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))

library(NACHO)
#> 
#> Attaching package: 'NACHO'
#> The following object is masked from 'package:BiocGenerics':
#> 
#>     normalize
GSE70970_sum <- summarise(
  data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  ssheet_csv = targets, # The samplesheet
  id_colname = "IDFILE", # Name of the column that contains the identfiers
  housekeeping_genes = NULL, # Custom list of housekeeping genes
  housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  normalisation_method = "GEO", # Geometric mean or GLM
  n_comp = 5 # Number indicating the number of principal components to compute. 
)
#> [NACHO] Importing RCC files.
#> [NACHO] Performing QC and formatting data.
#> [NACHO] Searching for the best housekeeping genes.
#> [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
#> [NACHO] The following predicted housekeeping genes will be used for normalisation:
#>   - hsa-miR-103
#>   - hsa-let-7e
#>   - hsa-miR-1260
#>   - hsa-miR-500+hsa-miR-501-5p
#>   - hsa-miR-1274b
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Missing values have been replaced with zeros for PCA.
#> [NACHO] Normalising data using "GEO" method with housekeeping genes.
#> [NACHO] Returning a list.
#>   $ access              : character
#>   $ housekeeping_genes  : character
#>   $ housekeeping_predict: logical
#>   $ housekeeping_norm   : logical
#>   $ normalisation_method: character
#>   $ remove_outliers     : logical
#>   $ n_comp              : numeric
#>   $ data_directory      : character
#>   $ pc_sum              : data.frame
#>   $ nacho               : data.frame
#>   $ outliers_thresholds : list
#>   $ raw_counts          : data.frame
#>   $ normalised_counts   : data.frame

sessioninfo::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       Debian GNU/Linux 9 (stretch)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language en_GB.UTF-8                 
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2019-11-15                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package      * version date       lib source        
#>  assertthat     0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
#>  backports      1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
#>  Biobase      * 2.44.0  2019-05-02 [1] Bioconductor  
#>  BiocGenerics * 0.30.0  2019-05-02 [1] Bioconductor  
#>  cli            1.1.0   2019-03-19 [1] CRAN (R 3.6.1)
#>  colorspace     1.4-1   2019-03-18 [1] CRAN (R 3.6.1)
#>  crayon         1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
#>  curl           4.2     2019-09-24 [1] CRAN (R 3.6.1)
#>  digest         0.6.21  2019-09-20 [1] CRAN (R 3.6.1)
#>  dplyr          0.8.3   2019-07-04 [1] CRAN (R 3.6.1)
#>  ellipsis       0.3.0   2019-09-20 [1] CRAN (R 3.6.1)
#>  evaluate       0.14    2019-05-28 [1] CRAN (R 3.6.1)
#>  GEOquery     * 2.52.0  2019-05-02 [1] Bioconductor  
#>  ggplot2        3.2.1   2019-08-10 [1] CRAN (R 3.6.1)
#>  glue           1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
#>  gtable         0.3.0   2019-03-25 [1] CRAN (R 3.6.1)
#>  highr          0.8     2019-03-20 [1] CRAN (R 3.6.1)
#>  hms            0.5.1   2019-08-23 [1] CRAN (R 3.6.1)
#>  htmltools      0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
#>  knitr          1.25    2019-09-18 [1] CRAN (R 3.6.1)
#>  lazyeval       0.2.2   2019-03-15 [1] CRAN (R 3.6.1)
#>  lifecycle      0.1.0   2019-08-01 [1] CRAN (R 3.6.1)
#>  limma          3.40.6  2019-07-26 [1] Bioconductor  
#>  magrittr       1.5     2014-11-22 [1] CRAN (R 3.6.1)
#>  munsell        0.5.0   2018-06-12 [1] CRAN (R 3.6.1)
#>  NACHO        * 0.6.1   2019-10-12 [1] CRAN (R 3.6.1)
#>  pillar         1.4.2   2019-06-29 [1] CRAN (R 3.6.1)
#>  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 3.6.1)
#>  purrr          0.3.3   2019-10-18 [1] CRAN (R 3.6.1)
#>  R6             2.4.0   2019-02-14 [1] CRAN (R 3.6.1)
#>  Rcpp           1.0.2   2019-07-25 [1] CRAN (R 3.6.1)
#>  readr          1.3.1   2018-12-21 [1] CRAN (R 3.6.1)
#>  rlang          0.4.0   2019-06-25 [1] CRAN (R 3.6.1)
#>  rmarkdown      1.16    2019-10-01 [1] CRAN (R 3.6.1)
#>  scales         1.0.0   2018-08-09 [1] CRAN (R 3.6.1)
#>  sessioninfo    1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
#>  stringi        1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
#>  stringr        1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
#>  tibble         2.1.3   2019-06-06 [1] CRAN (R 3.6.1)
#>  tidyr          1.0.0   2019-09-11 [1] CRAN (R 3.6.1)
#>  tidyselect     0.2.5   2018-10-11 [1] CRAN (R 3.6.1)
#>  vctrs          0.2.0   2019-07-05 [1] CRAN (R 3.6.1)
#>  withr          2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
#>  xfun           0.10    2019-10-01 [1] CRAN (R 3.6.1)
#>  xml2           1.2.2   2019-08-09 [1] CRAN (R 3.6.1)
#>  yaml           2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
#>  zeallot        0.1.0   2018-01-28 [1] CRAN (R 3.6.1)
#> 
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library

I restarted R and tried again now it worked, sry dont know what went wrong the first time.

best regards
Sebastian

library(GEOquery)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply,
parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq,
Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Setting options('download.file.method.GEOquery'='auto')
Setting options('GEOquery.inmemory.gpl'=FALSE)

gse <- getGEO("GSE70970")
Found 1 file(s)
GSE70970_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/matrix/GSE70970_series_matrix.txt.gz'
Content type 'application/x-gzip' length 351607 bytes (343 KB)
==================================================
downloaded 343 KB

Parsed with column specification:
cols(
.default = col_double(),
ID_REF = col_character()
)
See spec(...) for full column specifications.
File stored at:
/tmp/RtmpQb9ReH/GPL20699.soft

targets <- pData(phenoData(gse[[1]]))
getGEOSuppFiles(GEO = "GSE70970", baseDir = tempdir())
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/suppl//GSE70970_RAW.tar?tool=geoquery'
Content type 'application/x-tar' length 1986560 bytes (1.9 MB)
==================================================
downloaded 1.9 MB

trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE70nnn/GSE70970/suppl//GSE70970_characteristics_readme.txt.gz?tool=geoquery'
Content type 'application/x-gzip' length 672 bytes

downloaded 672 bytes

                                                               size isdir mode               mtime               ctime

/tmp/RtmpQb9ReH/GSE70970/GSE70970_RAW.tar 1986560 FALSE 664 2019-11-15 11:31:34 2019-11-15 11:31:34
/tmp/RtmpQb9ReH/GSE70970/GSE70970_characteristics_readme.txt.gz 672 FALSE 664 2019-11-15 11:31:35 2019-11-15 11:31:35
atime uid gid uname grname
/tmp/RtmpQb9ReH/GSE70970/GSE70970_RAW.tar 2019-11-15 11:31:32 1000 1000 sebastian sebastian
/tmp/RtmpQb9ReH/GSE70970/GSE70970_characteristics_readme.txt.gz 2019-11-15 11:31:34 1000 1000 sebastian sebastian

untar(

  • tarfile = paste0(tempdir(), "/GSE70970/GSE70970_RAW.tar"),
  • exdir = paste0(tempdir(), "/GSE70970/Data")
  • )

targets$IDFILE <- list.files(paste0(tempdir(), "/GSE70970/Data"))
library(NACHO)

Attaching package: ‘NACHO’

The following object is masked from ‘package:BiocGenerics’:

normalize

library(NACHO)
GSE70970_sum <- summarise(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
  • ssheet_csv = targets, # The samplesheet
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
  • normalisation_method = "GEO", # Geometric mean or GLM
  • n_comp = 5 # Number indicating the number of principal components to compute.
  • )
    [NACHO] Importing RCC files.
    |========================================================================================================|100% ~0 s remaining
    [NACHO] Performing QC and formatting data.
    [NACHO] Searching for the best housekeeping genes.
    [NACHO] Computing normalisation factors using "GEO" method for housekeeping genes prediction.
    [NACHO] The following predicted housekeeping genes will be used for normalisation:
    • hsa-miR-103
    • hsa-let-7e
    • hsa-miR-1260
    • hsa-miR-500+hsa-miR-501-5p
    • hsa-miR-1274b
      [NACHO] Computing normalisation factors using "GEO" method.
      [NACHO] Missing values have been replaced with zeros for PCA.
      [NACHO] Normalising data using "GEO" method with housekeeping genes.
      [NACHO] Returning a list.
      $ access : character
      $ housekeeping_genes : character
      $ housekeeping_predict: logical
      $ housekeeping_norm : logical
      $ normalisation_method: character
      $ remove_outliers : logical
      $ n_comp : numeric
      $ data_directory : character
      $ pc_sum : data.frame
      $ nacho : data.frame
      $ outliers_thresholds : list
      $ raw_counts : data.frame
      $ normalised_counts : data.frame

sessioninfo::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os Ubuntu 18.04.3 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Berlin
date 2019-11-15

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
Biobase * 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics * 0.30.0 2019-05-02 [1] Bioconductor
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
curl 4.2 2019-09-24 [1] CRAN (R 3.6.1)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
GEOquery * 2.52.0 2019-05-02 [1] Bioconductor
ggplot2 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1)
lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.1)
limma 3.40.6 2019-07-26 [1] Bioconductor
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
NACHO * 0.6.1 2019-10-12 [1] CRAN (R 3.6.1)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.1)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
purrr 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
rlang 0.4.1 2019-10-24 [1] CRAN (R 3.6.1)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.1)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.1)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.1)
tidyr 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.1)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.1)

[1] /home/sebastian/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library

Perfect!
Enjoy NACHO ;)

Hi Mcanouil,

Restarted R and tried to run the code fresh again. Still the same error!
`> GSE70970_sum <- summarize(

  • data_directory = paste0(tempdir(), "/GSE70970/Data"), # Where the data is
    
  • ssheet_csv = targets, # The samplesheet
    
  • id_colname = "IDFILE", # Name of the column that contains the identfiers
    
  • housekeeping_genes = NULL, # Custom list of housekeeping genes
    
  • housekeeping_predict = TRUE, # Predict the housekeeping genes based on the data?
    
  • normalisation_method = "GEO", # Geometric mean or GLM
    
  • n_comp = 5 # Number indicating the number of principal components to compute. 
    
  • )`

Error goes like this : [NACHO] Importing RCC files. Error: Column cols must be length 1 (the number of rows), not 3

Any other solutions?
Thanks for quick response.

Athul