r-lib/rig

CRAN repository set by rig is to aggressive; overrides all RStudio settings

Opened this issue · 9 comments

When rig sets the CRAN repository, it does so by adding:

options(repos = c(CRAN = "https://cloud.r-project.org/"))

to library/base/R/Rprofile.

This setting is very aggressive; it cannot be disabled (not even by running with R --vanilla), and because it is set after R starts up, it overwrites RStudio's CRAN settings.

Using any rig version of R will cause warnings to appear in RStudio's Packages options, and the options that control CRAN repositories will not work with any rig version of R.

image

I would propose the following:

  1. Before setting the CRAN repository, check to see if it's set to something that appears intentional (e.g. not @CRAN@). If it appears to be intentionally set, leave it alone instead overwriting the user's preference.
  2. Move the CRAN repository setting from the base library Rprofile to a regular R startup script, such as Rprofile.site. When diagnosing where the CRAN repository is getting set, absolutely no one is going to think to check in the base library.
  3. Set an attribute on the options value to indicate rig as the origin of the setting.

See also: rstudio/rstudio#13957

The goal of the rig setting is to provide a good default. No modern package manager comes without a default repo set, the current R default of no repo set creates a very poor user experience.

This setting is very aggressive; it cannot be disabled (not even by running with R --vanilla), and because it is set after R starts up, it overwrites RStudio's CRAN settings.

It can be disabled or the site admin can override it in Rprofile.site, and the user can disable or override it in ~/.Rprofile. You can also use rig add --without-cran-mirror in which case it is not set at all.

It does run with --vanilla, but that's fine I think, because the goal is to have a good default. Is there a use case when you want to start R without any repo set?

It does not really overwrite any RStudio settings IMO, as repos is not set at all when it runs. (In any case, if I move the repos setting to another profile, that does not change RStudio's current behavior AFAICT, and you'll get the same picture.)

Before setting the CRAN repository, check to see if it's set to something that appears intentional (e.g. not @cran@). If it appears to be intentionally set, leave it alone instead overwriting the user's preference.

It is running in the base profile, so repos is not set there, it is always NULL, no need to check it.

Move the CRAN repository setting from the base library Rprofile to a regular R startup script, such as Rprofile.site.

Rprofile.site is for the sysadmin to write, not for us. Similarly, ~/.Rprofile is for the user to write, not for us. People write their profiles by hand, I don't want to interfere with that.

When diagnosing where the CRAN repository is getting set, absolutely no one is going to think to check in the base library.

True, but I am not sure why you'd check where it is set. :) If you don't like it, the admin can change it, and the user can also change it.

Set an attribute on the options value to indicate rig as the origin of the setting.

I can certainly do that.

In summary, I agree that the current setting is somewhat mysterious, as to where it happens. But I am convinced that it is the right thing to do, as it provides the best user experience in all these common cases:

  1. User does not care what repos are set, they just want things to work. I think this is the vast majority of the users. The current rig default is great for these people.
  2. Admin manages the repo settings, they can override or modify the repos set by rig as they wish in the site profile.
  3. User wants to modify the repo settings, to add extra repos (more common?), or to change the default CRAN repo (rare?).

I actually think that these ways of setting the repos improve the user experience in RStudio as well, because the RStudio repo settings do not apply to subprocesses (AFAICT). So e.g. if you set a repo in the RStudio options, that setting does not apply to background jobs, when running tests in the build pane, etc. Which is not great.

(Sorry this got pretty long....)

Thanks, really appreciate all the context!

It is running in the base profile, so repos is not set there, it is always NULL, no need to check it.

This isn't true when running in RStudio. RStudio sets the repository options very early at startup, using the user's preferred repositories as specified (by the user!) in Global Options. This happens before the base package gets loaded. There are also a bunch of system and admin level settings that RStudio considers when setting the repository, which are important as they give admins control over what repositories their users access by default. You can see where this happens here:

https://github.com/rstudio/rstudio/blob/bdd5a2bf246b35db32b4dd2a033f572745182e80/src/cpp/session/SessionMain.cpp#L2363-L2372

The net effect is that when using a rig version of R, none of the sources above work since rig always overwrites whatever setting RStudio injects.

It's okay if rig wants to provide a fallback CRAN repository, and I 100% agree that it much improves everyone's quality of life to have a fallback. But the tradeoff is that RStudio's repository settings don't work at all, and I think we could come up with a solution that lets both things work correctly.

This isn't true when running in RStudio. RStudio sets the repository options very early at startup, using the user's preferred repositories as specified (by the user!) in Global Options. This happens before the base package gets loaded.

I actually tried this, and it is not what I see. If I remove the rig repo setting, and put

options(origrepos = list(getOption("repos")))
print(getOption("repos"))

at the same place, and set the repos in RStudio, then at startup in RStudio, I see NULL, and origrepos is also list(NULL), even though repos is set later correctly:

> getOption("repos")
                      CRAN 
"https://cran.rediris.es/" 
attr(,"RStudio")
[1] TRUE

(TBH I am not even sure how you can set options before loading the base package, options() is a function in the base package...)

But the tradeoff is that RStudio's repository settings don't work at all, and I think we could come up with a solution that lets both things work correctly.

That's true, but that's because RStudio is giving preference to repos set in any profile, no?

Even without rig, if people need to set their repos in some profile, the result of that is that they cannot change them in RStudio, correct?

Also, perhaps most importantly, setting repos in a profile (by rig or by the admin or user) is a better user experience than setting them in RStudio, because of the lack of support for it in jobs, tests, in the terminal, etc.

I think we could come up with a solution that lets both things work correctly.

Probably, but it seems to me that checking the previous value of repos is not the way, unless I am missing something.

Btw. this is somewhat relevant, new in R 4.3.0:

Package repositories in getOption("repos") are now initialized from the ‘repositories’ file when utils is loaded (if not already set, e.g., in ‘.Rprofile’). (From a report and patch proposal by Gabriel Becker in PR#18405.)

and in R 4.3.2:

The default initialization of the "repos" option from the ‘repositories’ file at startup can be skipped by setting environment variable R_REPOSITORIES to NULL such that getOption("repos") is empty if not set elsewhere.

@jmcphers One thing rig could do is to check the RSTUDIO env var, and if that is set then not set the repos option at all.

I think that's sensible, but it will probably be a suboptimal change for some people:

  • repos won't be set in RStudio subprocesses, which does not seem to matter for most people, and
  • no PPM setup (on Windows and Linux currently), which is quite a disruptive change I think.

@jmcphers So, would you prefer if rig didn't set repos in RStudio at all? (I.e. when the RSTUDIO env var is set.)

I can certainly change rig to do that by default and suggest that people manage repos with RStudio.

@jmcphers This is a gentle ping, as I am working on rig for a couple of days.

Do you want me to change rig, so that the repos option is not set if the RSTUDIO env var is set? I.e. not touching the repository settings at all in RStudio?

FWIW, i am in favor of this.