DominikRafacz/deepdep

custom cran repos/mirror

dseynaev opened this issue · 7 comments

The CRAN mirror/repo to use seems to be hardcoded:

contriburl = contrib.url("https://cloud.r-project.org/"))[, 1],

Could the default getOption("repos") not be used? Or alternatively, something like contrib.url(getOption("deepdep.repos", "https://cloud.r-project.org/")) ?

(sorry for not answering earlier, I saw it just today)
Yes, it very much could (unless there's something I don't know of), good catch! Could you provide an example URL you'd like to use there, so that we could verify that the change works correctly?

Can I ask you one question first? What makes you use deepdep::get_available_packages() instead of available.packages() from utils?

Warning: somewhat lengthy description of current work below, not really related, but describing plans for future deepdep

We plan on restructuring deepdep quite a bit since we made it in 2019, when our experience with creating R packages was lackluster, and get_available_packages() is something I see problems in. For example, I don't like bioc = TRUE returning packages from both CRAN and Bioconductor; I'd rather have them separate so the user can combine them at will. get_ prefix is also redundant; if anything, I could accept a dd_ prefix.
Actually, I started implementing a separate package called woodendesc, since get_available_packages(), get_description() and get_downloads() are all not related to analyzing package dependencies and thus not really within the scope of deepdep. We won't remove them any time soon, not without prior deprecation, but I think I'd rather work on new, clean implementations that future deepdep could import than patch the existing ones.

Fair question and thanks for the context. I'm just using deepdep::deepdep() actually and was wondering why it was only considering CRAN irregardless of my getOption("repos") and traced it down to is_available() and get_available_packages()

Okay, now I understand. In that case, I think it's better to leave it as it is for now; while get_available_packages() check is not completely necessary, we use a http://crandb.r-pkg.org API that e.g. https://ropensci.r-universe.dev doesn't have. Namely, a possibility to simply append a package name to get DESCRIPTION file: http://crandb.r-pkg.org/deepdep. Thus, it would take much more significant changes to allow any repository. Perhaps using /src/contrib/PACKAGES that stores info about all packages in a repository, but that wouldn't take advantage of custom CRAN API.

Like I said, an overhaul is a work-in-progress; I'm yet to think of an intuitive API to deepdep() that would allow the user to specify any repository and where user wouldn't need to specify the whole URL for common repositories like Bioconductor. Perhaps a deepdep(package, repos = character(), ...), where repos may be a vector simple names like "bioc"/"Bioconductor"/"R-universe"/... or URLs that would become a search path for deepdep()? CRAN would be last by default, with an option to specify CRAN source through getOption("dd_cran") or something; because CRAN packages are rarely (if ever) dependent on non-CRAN packages, while R-universe universes usually have CRAN dependencies.
... might allow the user to specify universe within R-universe (e.g. "tidyverse") or perhaps Bioconductor version.

In retrospect I should have read better, it's mentioned in the first paragraph of the readme that http://crandb.r-pkg.org/ is used...

Perhaps using /src/contrib/PACKAGES that stores info about all packages in a repository, but that wouldn't take advantage of custom CRAN API.

Maybe a fallback implementation could make sense? Where it tries to use a metadata API (like http://crandb.r-pkg.org/) if one is available for a repo and /src/contrib/PACKAGES otherwise. Custom repos like the ones from R-universe generally have a much smaller PACKAGES compared to CRAN so perf should be less of a problem.

That's what I hope to do! I'm worried a bit about possibly breaking backwards compatiblity of deepdep functions, but other than that, my guess is it should be ready by the end of June at latest (though don't take my word for that).