Bioconductor/BiocFileCache

Reducing the number of package dependencies

Closed this issue · 6 comments

Hi Bioconductor team,

Is it possible to rework BiocFileCache a bit to not depend on quite so many tidyverse packages?
This is looking pretty heavy at the moment:

Depends | R (>= 3.4.0), dbplyr (>= 1.0.0)
Imports | methods, stats, utils, dplyr, RSQLite, DBI, filelock, curl, httr
AcidDevTools::packageDependencies("BiocFileCache")
## [1] "dbplyr"     "methods"    "stats"      "utils"      "dplyr"
## [6] "RSQLite"    "DBI"        "filelock"   "curl"       "httr"
## [11] "blob"       "cli"        "glue"       "lifecycle"  "magrittr"
## [16] "pillar"     "purrr"      "R6"         "rlang"      "tibble"
## [21] "tidyr"      "tidyselect" "vctrs"      "withr"      "generics"
## [26] "jsonlite"   "mime"       "openssl"    "bit64"      "memoise"
## [31] "pkgconfig"  "plogr"      "cpp11"      "bit"        "cachem"
## [36] "tools"      "askpass"    "fansi"      "utf8"       "stringr"
## [41] "graphics"   "grDevices"  "sys"        "fastmap"    "stringi"

Happy to help work on this!

Best,
Mike

In particular, can we take out the dplyr / dbplyr dependencies?

> packageDependencies("dplyr")
 [1] "cli"        "generics"   "glue"       "lifecycle"  "magrittr"
 [6] "methods"    "pillar"     "R6"         "rlang"      "tibble"
[11] "tidyselect" "utils"      "vctrs"      "fansi"      "utf8"
[16] "pkgconfig"  "withr"      "grDevices"  "graphics"   "stats"
> packageDependencies("dbplyr")
 [1] "blob"       "cli"        "DBI"        "dplyr"      "glue"
 [6] "lifecycle"  "magrittr"   "methods"    "pillar"     "purrr"
[11] "R6"         "rlang"      "tibble"     "tidyr"      "tidyselect"
[16] "utils"      "vctrs"      "withr"      "generics"   "fansi"
[21] "utf8"       "pkgconfig"  "stringr"    "cpp11"      "graphics"
[26] "grDevices"  "stats"      "stringi"    "tools"

And httr has been replaced by httr2

Here's current session info after only attaching BiocFileCache:

> library(BiocFileCache)
Loading required package: dbplyr
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocFileCache_2.8.0 dbplyr_2.3.3        AcidDevTools_0.6.15

loaded via a namespace (and not attached):
 [1] vctrs_0.6.3      httr_1.4.7       cli_3.6.1        rlang_1.1.1
 [5] DBI_1.1.3        generics_0.1.3   glue_1.6.2       bit_4.0.5
 [9] fansi_1.0.4      filelock_1.0.2   tibble_3.2.1     fastmap_1.1.1
[13] lifecycle_1.0.3  memoise_2.0.1    compiler_4.3.1   dplyr_1.1.3
[17] RSQLite_2.3.1    blob_1.2.4       pkgconfig_2.0.3  R6_2.5.1
[21] tidyselect_1.2.0 utf8_1.2.3       pillar_1.9.0     curl_5.0.2
[25] parallel_4.3.1   magrittr_2.0.3   tools_4.3.1      bit64_4.0.5
[29] cachem_1.0.8
>
lshep commented

the results and output are currently in dplyr tbl and thus use the functions accordingly to filter/mutate/summarize. It was a conscious effort to use more tidy like structures for the package.

The downside with that though is any other package that imports BiocFileCache requires all those additional dependencies, and it really starts to add up if you import any other informatics tools

lshep commented

I'm still not in favor of restructuring the entire package just for the sake of relieving the dependency. The tidy structures are stable and provide for efficient and condensed code.