trevorld/r-optparse

Can we get mandatory argument support ?

statquant opened this issue ยท 10 comments

If my understanding is correct there is no way to indicate that an argument is required and not specify a default value.
I raised the question : http://stackoverflow.com/questions/35252547/can-i-specify-mandatory-arguments-with-optparse
Am I wrong ?

This has been discussed before ( #3 ) and one can note this was never a feature in the Python optparse library this package is based off nor will it be a priority in this package (see https://docs.python.org/2/library/optparse.html#what-are-options-for for the reasoning on why). Other R packages such as "argparse" ( https://github.com/trevorld/argparse ) does support explicitly specifying mandatory positional arguments and/or mandatory optional arguments.

If you wanted mandatory arguments with "optparse" you could always manually check if a mandatory value was specified. For example for a mandatory optional argument:

  option_list <- list(make_option(c("-t", "--test"), type="character", "Mandatory option"))
  parser <- OptionParser(option_list=option_list)
  opt <- parse_args(parser)
  if(is.null(opt$test)) { 
        cat("Mandatary test argument not found\n")
        print_help(parser)
        quit(status=1)
  }

The second example in the package vignette shows checking if a mandatory positional argument was reasonable.

Edit 2018-10-19: Comment previously claimed incorrectly that the argparse package doesn't support required optional arguments but actually it does.

A simple R-style solution would be to lazy evaluate default only if needed and allow R expressions as values, which would allow passing something like default=stop().

@eantonya , the issue with your proposed stop() lazy evaluation approach is that we need to always evaluate default (if just to check if it is NULL or not) in order to enable smarter type casting (i.e. grabbing the type of the option from the default type if not otherwise defined).

Don't check for NULL, check for missing. And if type is unspecified and default is an expression - either complain and stop, or evaluate - I can see an argument made for either.

There is a school of thought that specifying optional arguments in a function (such as default here) is more clearly understood by users if done with a NULL and should be checked with is.null: http://adv-r.had.co.nz/Functions.html

I'm not fully persuaded of the merits of having the default argument in make_option try to complete two different tasks:

  1. Set the default including possibility of the user explicitly setting it to NULL
  2. Assert if an optional argument was explicitly passed an argument on the command line

Note if one sets an integer value for the positional_arguments argument of parse_args then optparse will throw an error if not enough or too little positional arguments are present.

With the lazy evaluation approach there is also the risk that a bunch of un-needed computations (possibly with other undesirable side-effects) occur before the stop() is triggered which could have been prevented if the user explicitly asserted that a reasonable value was set earlier in the Rscript.

I think it is much cleaner and safer to do Step 2 separately either by using instead mandatory positional arguments or instead checking if the optional argument is either present in commandArgs(TRUE):

 if (!grepl("^--mandatory_option", commandArgs(TRUE)) stop("mandatory_option not set") 

Or by doing something like checking that the parsed option is not NULL:

if(is.null(options[["mandatory_option"]])) stop("mandatory_option not set")

If those are too verbose for you one could always write a helper function::

# Assert mandatory option present
assert_mandatory_options <- function(options, mandatory_options=character()) {
    for (mo in mandatory_options) {
        if (is.null(options[[mo]])) {
              stop(paste("Forgot to set mandatory option", mo)
        }
    }
}

options <- parse_args(parser) # or options <- parse_args2(parser)$options
mandatory_options <-  c("mandatory_option1", "mandatory_option2")
assert_mandatory_options(options, mandatory_options)

Or the functions themselves later in the Rscript can assert if they were fed reasonable arguments.

The user can also forgo using this package altogether and do everything themselves - that's obviously not the point. Point is to improve this package to make it easier to use and more versatile. Crappy solutions to this outside of the package exist, but that's what they are - crappy.

If you simply must have a default value of NULL for default (which btw I can't ever imagine anyone specifying explicitly) for cultural reasons, that's fine too - you can still check if it's an expression before the null check. Or scratch all that and add a new bool argument.

Probably any solution you pick is faster to type out than all of this arguing, so maybe you just think that this should never be added, which is fine, but it makes this package less useful than it could be.

Here's a real-world example btw of why I need this, and why positional arguments are not a good solution.

I have an R script that given a date range and a country, prints out the official holidays of that country. All 3 arguments are mandatory, and have no sensible defaults.

Maybe you could argue that start/end dates can be positional, but that would still leave country up in the air + surely after using R one can appreciate how much nicer it is not to worry about position of arguments and instead just specify them by name wherever you like.

Crappy solutions to this outside of the package exist, but that's what they are - crappy.

There is also the argparse package (which I wrote to handle more advanced command-line use cases than optparse).

> library("argparse")
> parser = ArgumentParser()
> parser$add_argument("--option", required=TRUE)
> parser$parse_args()
Error in .stop(output, "parse error:") : parse error:
usage: PROGRAM [-h] --option OPTION
PROGRAM: error: the following arguments are required: --option
> parser$parse_args("--option=foo")
$option
[1] "foo"

Maybe you could argue that start/end dates can be positional, but that would still leave country up in the air + surely after using R one can appreciate how much nicer it is not to worry about position of arguments and instead just specify them by name wherever you like.

In your particular use case I'd argue that start, end, AND country need not be mandatory and in fact can all be given a sensible optional default. For a typical person one desired behaviour could be to see what would be all their official holidays for upcoming year after inferring the user's country:

  1. use as start today's date
  2. use as end a year from today's date
  3. a. default to a guess of the user's country (perhaps make inferences from Sys.getenv("LANG") which on my system would suggest I am from the "US")
    b. default to printing out all official holidays from all countries (or just the more common ones)
    c. default to your country
    d. default to another salient country
    e. Or make this the one required positional argument

As a user I appreciate it when developer's take the time to pick out or infer reasonable defaults for me and usually try to do the same with my Rscripts. I fail to see in your particular use case why any of your options need to be mandatory. The user can always try the command a second time with explicit options passed in to tweak the output for their use case (or maybe create an alias with their preferred settings).

If you simply must have a default value of NULL for default (which btw I can't ever imagine anyone specifying explicitly)

Someone could want to set that explicitly if they would then pass option into a function which interprets NULL as meaning the function should calculate a reasonable default i.e. one could have:

 print_public_holidays <- function(country = NULL, start = NULL, end = NULL) {
      # contents of function here
 }

which is then called in an Rscript by

print_public_holidays(options$country, options$start, options$end)

There is also the argparse package (which I wrote to handle more advanced command-line use cases than optparse).

This is not a good solution, because I believe that argparse includes a requirement for Python to be installed. See the vintage comment by Chris here;

https://stackoverflow.com/questions/3433603/parsing-command-line-arguments-in-r-scripts

unsolicited advice - as tempting as it is to use the outstanding python argparse package from within R, the cross-language dependency just makes your R script that much more complex and fragile. Don't do it. Use one of the pure-R options described above. โ€“
Chris Warth
Feb 20, 2015 at 18:19

as the developer of optparse, I hope you can understand how important this package is to the R ecosystem. The lack of the ability to easily configure an R script with required flag arguments has been a huge shortcoming for a very, very long time (about as long as this GitHub Issue has existed).

Despite the rhetoric on whether flags should or should not be possible to be required, ultimately in real-life situations it is quite often simpler to use cli flag arguments and to make them required. For example, your script has a number of required input parameter; if you can use long-form flags, you can make it much more clear to the reader and user what each input item is. Consider that you are working on a project (such as a long complex bioinformatics pipeline) and you come across a line like this, verbatim, embedded in the source code you are working on;

# some code here

# run the R script

myscript.R 1 2 3

# do more stuff

the lack of flags here make the required positional args completely unintelligible to everyone that has to read this.

and its not as simple as "just run myscript.R -h to see the help text", because the proliferation of containerized environments results in the user quite often not having an active or installed R environment that they can just easily switch to in order to even run myscript.R interactively

instead consider the same thing, but written as this;

# some code here

# run the R script

myscript.R --required-min-sd 1 --required-max-sd 2 --required-start-value 3  

# do more stuff

Now everyone's code is more readable and easier to understand without the requirement that every new user has to manually inspect myscript.R in order to understand what the heck is going on with args 1, 2, and 3

I hope you will reconsider your position on this, because the lack of this feature causes a serious detriment to R users and makes R that much more unpleasant to deal with for everyone.

Okay, I'd be willing to consider a pull request that implements a required argument to add_option() and make_option().

Do note though that developers have always been free to write scripts with {optparse} with required arguments. You just needed to check if the required options were set e.g.:

library("optparse")

parser <- OptionParser()
parser <- add_option(parser, "--option1")
parser <- add_option(parser, "--option2")
args <- parse_args(parser)

required_options <- c("option1", "option2")
for (opt in required_options) {
    if (is.null(args[[opt]])) {
        cat("required option", opt, "not found\n")
        quit('no', status = 1, runLast = FALSE)
    }
}

print(args)