waldronlab/BugSigDBExports

Easy way to link signature names back to the data.frame imported with importBugSigDB

Closed this issue · 3 comments

I think it could be helpful to have an 'id' or 'index' column in the data.frame imported by importBugSigDB so that it would be easier to link the signature names back to the data.frame. A toy example below (length would be other helpful information like scores, p-values, etc.).

library(bugsigdbr)
bsdb <- importBugSigDB()
#> Using cached version from 2022-08-17 13:43:05
my_sigs <- getSignatures(df = bsdb, tax.id.type = 'ncbi', tax.level = 'mixed')
nrow(bsdb)
#> [1] 2270
length(my_sigs)
#> [1] 2270

x <- lapply(my_sigs, function(x) data.frame(length = length(x)))
df <- do.call(rbind, x)
df$sig_name <- rownames(df)
df$id <- sub('_.*$', '', df$sig_name)
rownames(df) <- NULL

head(df[,c('id', 'length')])
#>           id length
#> 1 bsdb:1/1/1     20
#> 2 bsdb:1/1/2      2
#> 3 bsdb:1/2/1      2
#> 4 bsdb:1/2/2      3
#> 5 bsdb:1/3/1      2
#> 6 bsdb:1/4/1     24

id1 <- sub(".* ", "", bsdb$Experiment)
id2 <- sub(".* ", "", bsdb$Study)
id3 <- sub(".* ", "", bsdb$`Signature page name`)
id <- paste0('bsdb:', id1, '/', id2, '/', id3)

mean(duplicated(id))
#> [1] 0

bsdb$id <- id

merged_df <- merge(df, bsdb, by = 'id')

merged_df[,c('Experiment', 'Study', 'Signature page name', 'id', 'length')] |> 
    head()
#>     Experiment   Study Signature page name         id length
#> 1 Experiment 1 Study 1         Signature 1 bsdb:1/1/1     20
#> 2 Experiment 1 Study 1         Signature 2 bsdb:1/1/2      2
#> 3 Experiment 1 Study 2         Signature 1 bsdb:1/2/1      2
#> 4 Experiment 1 Study 2         Signature 2 bsdb:1/2/2      3
#> 5 Experiment 1 Study 3         Signature 1 bsdb:1/3/1      2
#> 6 Experiment 1 Study 4         Signature 1 bsdb:1/4/1     24

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R Under development (unstable) (2022-12-25 r83502)
#>  os       Pop!_OS 22.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-01-29
#>  pandoc   2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package       * version date (UTC) lib source
#>  assertthat      0.2.1   2019-03-21 [2] CRAN (R 4.3.0)
#>  BiocFileCache   2.7.1   2022-12-09 [1] Bioconductor
#>  bit             4.0.5   2022-11-15 [2] CRAN (R 4.3.0)
#>  bit64           4.0.5   2020-08-30 [2] CRAN (R 4.3.0)
#>  blob            1.2.3   2022-04-10 [2] CRAN (R 4.3.0)
#>  bugsigdbr     * 1.5.2   2022-11-24 [1] Bioconductor
#>  cachem          1.0.6   2021-08-19 [2] CRAN (R 4.3.0)
#>  cli             3.6.0   2023-01-09 [1] CRAN (R 4.3.0)
#>  curl            5.0.0   2023-01-12 [2] CRAN (R 4.3.0)
#>  DBI             1.1.3   2022-06-18 [2] CRAN (R 4.3.0)
#>  dbplyr          2.3.0   2023-01-16 [2] CRAN (R 4.3.0)
#>  digest          0.6.31  2022-12-11 [2] CRAN (R 4.3.0)
#>  dplyr           1.0.10  2022-09-01 [2] CRAN (R 4.3.0)
#>  evaluate        0.20    2023-01-17 [2] CRAN (R 4.3.0)
#>  fansi           1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
#>  fastmap         1.1.0   2021-01-25 [2] CRAN (R 4.3.0)
#>  filelock        1.0.2   2018-10-05 [1] CRAN (R 4.3.0)
#>  fs              1.6.0   2023-01-23 [2] CRAN (R 4.3.0)
#>  generics        0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
#>  glue            1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
#>  htmltools       0.5.4   2022-12-07 [2] CRAN (R 4.3.0)
#>  httr            1.4.4   2022-08-17 [2] CRAN (R 4.3.0)
#>  knitr           1.42    2023-01-25 [2] CRAN (R 4.3.0)
#>  lifecycle       1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
#>  magrittr        2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
#>  memoise         2.0.1   2021-11-26 [2] CRAN (R 4.3.0)
#>  pillar          1.8.1   2022-08-19 [2] CRAN (R 4.3.0)
#>  pkgconfig       2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
#>  purrr           1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache         0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3     1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo            1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils         2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6              2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
#>  Rcpp            1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
#>  reprex          2.0.2   2022-08-17 [2] CRAN (R 4.3.0)
#>  rlang           1.0.6   2022-09-24 [2] CRAN (R 4.3.0)
#>  rmarkdown       2.20    2023-01-19 [2] CRAN (R 4.3.0)
#>  RSQLite         2.2.20  2022-12-22 [1] CRAN (R 4.3.0)
#>  rstudioapi      0.14    2022-08-22 [2] CRAN (R 4.3.0)
#>  sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler          1.9.0   2023-01-15 [1] CRAN (R 4.3.0)
#>  tibble          3.1.8   2022-07-22 [2] CRAN (R 4.3.0)
#>  tidyselect      1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
#>  utf8            1.2.2   2021-07-24 [2] CRAN (R 4.3.0)
#>  vctrs           0.5.2   2023-01-23 [2] CRAN (R 4.3.0)
#>  withr           2.5.0   2022-03-03 [2] CRAN (R 4.3.0)
#>  xfun            0.36    2022-12-21 [2] CRAN (R 4.3.0)
#>  yaml            2.3.7   2023-01-23 [2] CRAN (R 4.3.0)
#> 
#>  [1] /home/samuel/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /home/samuel/Apps/R-devel/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2023-01-29 with reprex v2.0.2

Actually, I think it's Study/Experiment/Study page name.

Thanks @sdgamboa. It would make sense to incorporate this directly in the export. I am transferring to BugSigDBExports.

Incorporated via 428eb49.