cran4linux/bspm

Provide a function+script to move user packages

Closed this issue ยท 28 comments

Initial motivation: eddelbuettel/r2u#30. Deployment of bspm-backed system repositories require an initial deployment that removes from the user library (and installs into the system library) everything that can be installed from the system repos to avoid package shadowing issues. This is more or less straightforward for a single user, but it would be nice to ship a deployment script to help sysadmins set this up for a bunch of users in multitenant servers.

@MatthieuStigler In 485ff09, I added a function that takes a library path and moves packages to the system library:

https://github.com/Enchufa2/bspm/blob/485ff097720c5019ed254013798b6a68598b4434/R/manager.R#L59-L66

Run this for each user and there will be no more version collisions.

Should it check for same or lower version before replacing? Would be salty to have a dev package one works on replaced by an older released version.

It wouldn't be much of a loss, because we are constantly installing and reinstalling the dev versions in which we are working, but it's a fair point. Now that we have available_sys(), we can make a safer default version that preserves newer versions in the user dir, and an optional burn-it-with-fire version such as the one above.

@MatthieuStigler The PR above includes a safer moveto_sys() function that doesn't touch newer package versions by default, and a script to mass-call moveto_sys() for several users and/or libraries. You can call it as follows:

$ Rscript -e bspm:::scripts mass_move user1 user2 ... lib1 lib2 ...

The script requires sudo privileges. By default, it does a dry run, meaning that it won't touch anything, it just reports the user libraries found, until you add the --run option:

$ Rscript -e bspm:::scripts mass_move --run user1 user2 ... lib1 lib2 ...

And still it will ask whether to proceed or not for each library, so it's pretty safe.

awesome, thanks! I look forward to testing it on our server very soon!

Hi @Enchufa2 , I gave a try to bspm::moveto_sys, very nice! I like in particular the "ask" option, very clean.

One comment: is there any chance there could be an additional check for "github" packages? If one uses remotes::install_github but the github version doesn't contain a (dec) version increment, one would end up remove newer packages? This happened to me trying dang's new function dang::shadowedPackages, yet at a second run the dev version was removed as its github number is the same as the CRAN on?

Admittedly, I am not exactly sure how one should handle github versions as a sys admin moving to a full bspm server wide, I can think of two cases:

  • Less controversial: user has a github package that is same version as CRAN: I would not want to remove
  • Controversial: user has a github package that is older than CRAN: I guess one can argue how that should be handled.

Thanks!!

Also, is seems to leave some duplicated/shadowed packages?

Output from dang::shadowedPackages():

Package LibPath Version Latest
bspm /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.4.2.1 TRUE
bspm /usr/local/lib/R/site-library 0.4.2 FALSE
bspm /usr/lib/R/site-library 0.4.2 FALSE
RcppGSL /usr/local/lib/R/site-library 0.3.13 TRUE
RcppGSL /usr/lib/R/site-library 0.3.13 TRUE

Note that both @Enchufa2 and I work off installed.packages() which does AFAIK not cover the remotes tag from install_github.

Can't you infer that from packageDescription? https://stackoverflow.com/questions/49698348/how-to-find-out-which-package-was-installed-from-github-in-my-r-library

bspm::disable()
install.packages("A3")
#> Installing package into '/home/mstigler/R/x86_64-pc-linux-gnu-library/4.2'
#> (as 'lib' is unspecified)
packageDescription("A3", lib.loc=.libPaths()[1])[c("GithubRepo", "Repository")]
#> $<NA>
#> NULL
#> 
#> $Repository
#> [1] "CRAN"

remotes::install_github("eddelbuettel/dang")
#> Skipping install of 'dang' from a github remote, the SHA1 (a3d38730) has not changed since last install.
#>   Use `force = TRUE` to force installation
packageDescription("dang", lib.loc=.libPaths()[1])[c("GithubRepo", "Repository")]
#> $GithubRepo
#> [1] "dang"
#> 
#> $<NA>
#> NULL

Created on 2023-02-13 with reprex v2.0.2

Calling installed.packages() once is simple, and fast.

Reading each DESCRIPTION file is not. Sounds you are on your way towards a local (or released) darklyShadowedPackages() function ๐Ÿ˜‰

Hi @Enchufa2 , I gave a try to bspm::moveto_sys, very nice! I like in particular the "ask" option, very clean.

Thanks! :)

One comment: is there any chance there could be an additional check for "github" packages?

I can look into it. Admittedly, speed is not a requirement for this function in particular.

Admittedly, I am not exactly sure how one should handle github versions as a sys admin moving to a full bspm server wide, I can think of two cases:

  • Less controversial: user has a github package that is same version as CRAN: I would not want to remove
  • Controversial: user has a github package that is older than CRAN: I guess one can argue how that should be handled.

I'd say:

  • Same version: mmmh, maybe. Probably. Need to think about it.
  • Older: burn it with fire. Always. 100% :) Older means trouble in a regular installation. If an older version is needed, the user should use renv.

Also, is seems to leave some duplicated/shadowed packages?

Output from dang::shadowedPackages():

Package LibPath Version Latest
bspm /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2 0.4.2.1 TRUE
bspm /usr/local/lib/R/site-library 0.4.2 FALSE
bspm /usr/lib/R/site-library 0.4.2 FALSE
RcppGSL /usr/local/lib/R/site-library 0.3.13 TRUE
RcppGSL /usr/lib/R/site-library 0.3.13 TRUE

Note that moveto_sys takes the user dir if nothing is specified, and that is /home/mstigler/R/x86_64-pc-linux-gnu-library/4.2. To move shadowed packages from /usr/local/lib/R/site-library to /usr/lib/R/site-library, you need to run moveto_sys("/usr/local/lib/R/site-library"), and for this you need the proper permissions. If this is owned by root, it's best if you call Rscript -e bspm:::scripts mass_move -r /usr/local/lib/R/site-library instead, because this script takes a look at the owner and calls moveto_sys via sudo.

If an older version is needed, the user should use renv.

Or manually set the .libPaths() as I do in a helper script to access a package in a non-current version I prefer.

Fully agree on 'burn with fire' as the default for older packages we spot.

@MatthieuStigler The latest commit in the PR marks remotes with the same version as newer, and therefore do not touch them by default. Please let me know if this works for you.

great, thanks a lot Inaki! Did yo consider adding this as an argument?

I have been using moveto_sys for my own user on the server and so far has been going well, so will soon try the mass_move approach!

Speaking about this, could you clarify how to use the mass_move script? I can specify both users and/or libraries? Right now, adding another user that is not me will trigger the message:

Rscript -e bspm:::scripts mass_move  other_user

Error in cat(bspm:::user_lib(), fill = TRUE) : object 'user_lib' not found

Thanks!

great, thanks a lot Inaki! Did yo consider adding this as an argument?

I would rather not, because such an option would complicate the interface too much in my opinion. This option would be in effect only if the existing option newer is FALSE or "ask", but not if it's TRUE, so it would be kind of confusing, and would require a lot of documentation for little gain. I would prefer to pick the best default and stick to it. We either consider remotes with the same version as newer (current status) or we don't (and I revert the last commit).

Speaking about this, could you clarify how to use the mass_move script? I can specify both users and/or libraries? Right now, adding another user that is not me will trigger the message:

Rscript -e bspm:::scripts mass_move  other_user

Error in cat(bspm:::user_lib(), fill = TRUE) : object 'user_lib' not found

Mmmh, it seems that this dev version of bspm is not available from other_user, because you installed in your user library. bspm is supposed to be a system package anyway, so installing it with sudo would solve this.

And yes, you can provide users and libraries. If a string is not found as a directory, it's treated as a user, basically.

makes sense on the first point!

on the second: oh my bad, yes running as sudo solves the issue!

Surprisingly though, it doesn't seem to show any package for another user, though it seems that user has older packages?

$ Rscript -e bspm:::scripts mass_move other_user
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
NOTE: no changes will be made unless '--run' is specified
Gaining sudo access...
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
$ Rscript -e "packageVersion('cli')"
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
[1] โ€˜3.6.0โ€™
$ sudo head -n 3 /home/other_user/R/x86_64-pc-linux-gnu-library/4.2/cli/DESCRIPTION
Package: cli
Title: Helpers for Developing Command Line Interfaces
Version: 3.4.1

Mmmh... What's the output of dang::shadowedPackages() for this other_user?

(I also did this 'by hand' on another machine with CRAN data.table and noticed I needed one more line in shadowedPackages. Just committed that.)

Mmmh... What's the output of dang::shadowedPackages() for this other_user?

And more importantly, what's the output of bspm:::user_lib() for this other_user? Because it seems that the script didn't find the library.

Here is the output:

$ Rscript -e bspm:::scripts mass_move other_user
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
NOTE: no changes will be made unless '--run' is specified
Gaining sudo access...
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
$ sudo -H -u other_user bash -c 'Rscript -e dang::shadowedPackages\(\)'
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
          Package                       LibPath Version n good
  1:          DBI /usr/local/lib/R/site-library   1.1.3 2 TRUE
  2:          DBI       /usr/lib/R/site-library   1.1.3 2 TRUE
  3:           R6 /usr/local/lib/R/site-library   2.5.1 2 TRUE
  4:           R6       /usr/lib/R/site-library   2.5.1 2 TRUE
  5: RColorBrewer /usr/local/lib/R/site-library   1.1.3 2 TRUE
 ---                                                          
201:         xfun       /usr/lib/R/site-library    0.37 2 TRUE
202:         xml2 /usr/local/lib/R/site-library   1.3.3 2 TRUE
203:         xml2       /usr/lib/R/site-library   1.3.3 2 TRUE
204:         yaml /usr/local/lib/R/site-library   2.3.7 2 TRUE
205:         yaml       /usr/lib/R/site-library   2.3.7 2 TRUE

Specifically:

$ sudo -H -u other_user/ bash -c 'Rscript -e "subset(dang::shadowedPackages(), !good)"'
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
    Package                                        LibPath  Version n  good
1:      cli /home/other_user/R/x86_64-pc-linux-gnu-library/4.2    3.4.1 3 FALSE
2:   fixest /home/other_user/R/x86_64-pc-linux-gnu-library/4.2   0.11.0 2 FALSE
3: jsonlite /home/other_user/R/x86_64-pc-linux-gnu-library/4.2    1.8.3 3 FALSE
5:    vctrs /home/other_user/R/x86_64-pc-linux-gnu-library/4.2    0.5.0 2 FALSE
$ sudo -H -u other_user bash -c 'Rscript -e bspm:::user_lib\(\)'
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
[1] "/home/other_user/R/x86_64-pc-linux-gnu-library/4.2"

Wait, does the user exist? Or you just created the directories and put some packages in there for testing purposes? Because the script checks that the user exists with the id command. ;)

hah, yes it's not a ๐Ÿ‘ป! It's a real user, just changed the real name with other_user in the output shown here.

Note if the user doesn't exist, the script still runs and returns something, yet slightly different:

$ Rscript -e bspm:::scripts mass_move other_user
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
NOTE: no changes will be made unless '--run' is specified
Gaining sudo access...
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"

$ Rscript -e bspm:::scripts mass_move inexistant_user
Loading required package: utils
Tracing function "install.packages" in package "utils"
Untracing function "install.packages" in package "utils"
NOTE: no changes will be made unless '--run' is specified
Gaining sudo access...

Ok, another thing... What is the output of

$ sudo -u other_user bash -c 'Rscript -e bspm:::user_lib\(\)'
$ sudo -H -u other_user bash -c 'Rscript -e bspm:::user_lib\(\)'

? Maybe the -H is required?

Oh, I found it... I forgot that usually home folders don't have any visibility, so we need some more sudo commands there. The last commit (b74c7c2) should fix it. Could you please try it? Thanks!

Well done, it is working now!

$Rscript -e bspm:::scripts mass_move other_user
Loading required package: utils
Tracing function "install.packages" in package "utils"
NOTE: no changes will be made unless '--run' is specified
Gaining sudo access...
[sudo] password for mstigler: 
Loading required package: utils
Tracing function "install.packages" in package "utils"
Found 16 packages in other_user's /home/other_user/R/x86_64-pc-linux-gnu-library/4.2

Next question is: I guess I need to install bspm first on each user? Is that something that you would consider eventually adding in the script?

Thanks!

Mmh, thinking about it, installing bspm on other users is just a temporary problem, since I installed it through github, and hence only avaialble for my local libpath, right? Later on when bspm is updated r2u, this won't be an issue I imagine?

You can tell all install functions to respect a location, from install.packages() on down. And remotes::install_packages() passes ... down so you can set lib="/where/ever".

bspm is supposed to be a system package anyway (installed via sudo), so all users will have access without installing it one by one. So yes, it's a temporary problem now that we are testing from GitHub.

Next thing would be to add -r to the command above to actually install the packages using bspm and remove them from the user library.

Thanks for testing! I'll merge the PR then.