KentonWhite/ProjectTemplate

Feature request: Allow to pass custom arguments to the reader functions

Opened this issue · 6 comments

Report an Issue / Request a Feature

I'm submitting a (Check one with "x") :

  • feature request

Issue Severity Classification -
  • 3 - Low
Expected Behavior

The arguments to a specific reader are available through the file.reader.

Current Behavior

All readers take the path to the data file and the variable to store the data as arguments. When a specific file needs special treatment there is no way to override the default options.

Possible Solution

A few steps are necessary to implement this:

  • Change the signature of the reader functions to the following:
    • filename: Path to the file to be loaded
    • variablename: Variable in the global environment to store the loaded data in
    • ...: Arbitrary arguments to pass to the implemented reader
  • Change the file.reader to pass all arguments, beside the extension (which is used to determine which reader to use), to the reader.
Related issues:
  • #189 allow to specify encoding in global.dcf (one mentioned solution was to allow override per file)

There are some custom readers in the wild. Changing the signature in this way may cause some trouble with the custom readers. Could setMethod be used to preserve the original signature?

I'm thinking we could perhaps do some inspection of the arguments in add.extension using the formals function. If an old style reader is passed it can raise a warning and wrap the reader in a closure with the correct arguments. I haven't tried it yet, but we could do some interface changes as we move to version 1.0.

That sounds too cool!

A small proof of concept:

# New style reader
test_reader_new_style <- function(file.name, variable.name, ...) {
  return(list(file.name = file.name, 
              variable.name = variable.name, 
              '...' = list(...)))
}
# Old style reader
test_reader_old_style <- function(data.file, file.name, variable.name) {
  return(list(data.file = data.file,
              file.name = file.name, 
              variable.name = variable.name))
}
# Old style reader nr2
test_reader_old_style_2 <- function(data.file, file.name, variable.name) {
  return(list(data.file.2 = data.file,
              file.name.2 = file.name, 
              variable.name.2 = variable.name))
}

# ProjectTemplate:::extensions.dispatch.table
extension_dispatch_env <- new.env()
# ProjectTemplate::.add_extension
test_add_extension <- function(extension, reader_function) {
  if (!identical(names(formals(reader_function)), 
                 c('file.name', 'variable.name', '...'))) {
    warning('A reader with non-standard arguments detected.\n',
            'Assuming old style arguments, for now will be wrapped in a helper function.\n',
            'In a future version this will become an error.')
    # Old style reader, define a function with the new style arguments that
    # calls the function that was passed in. Because this happens in a closure
    # the reference to reader_function will still exist outside once
    # test_add_extension finishes.
    # Can't reuse the old name because that causes a recursion error.
    func_store <- function(file.name, variable.name, ...) {
      reader_function(basename(file.name), file.name, variable.name)
    }
  } else {
    # New style reader, store reference temporarily (see branch above)
    func_store <- reader_function
  }
  # Store the, possibly wrapped, function in the environment
  extension_dispatch_env[[extension]] <- func_store
}

# Set warnings to immediate so we can see the timing when the warning is issued
opts <- options('warn' = 1)
# Add the new reader, should not issue warning
print('Add new')
test_add_extension('a', test_reader_new_style)
# Add the old reader, should issue warning
print('Add old')
test_add_extension('b', test_reader_old_style)
# Add the second old reader, should issue warning
print('Add old 2')
test_add_extension('c', test_reader_old_style_2)
# Call the new reader to check the received arguments
print('New:')
str(extension_dispatch_env$a('data/test.txt', 'test_var', encoding = 'utf-8'))
# Call the old reader to check the received arguments
print('Old:')
str(extension_dispatch_env$b('data/test.txt', 'test_var', encoding = 'utf-8'))
# Call the old reader to check the received arguments
print('Old2:')
str(extension_dispatch_env$c('data/test.txt', 'test_var', encoding = 'utf-8'))
# Reset the warning options
options(opts)

Output:

[1] "Add new"
[1] "Add old"
Warning in test_add_extension("b", test_reader_old_style) :
  A reader with non-standard arguments detected.
Assuming old style arguments, for now will be wrapped in a helper function.
In a future version this will become an error.
[1] "Add old 2"
Warning in test_add_extension("c", test_reader_old_style_2) :
  A reader with non-standard arguments detected.
Assuming old style arguments, for now will be wrapped in a helper function.
In a future version this will become an error.
[1] "New:"
List of 3
 $ file.name    : chr "data/test.txt"
 $ variable.name: chr "test_var"
 $ ...          :List of 1
  ..$ encoding: chr "utf-8"
[1] "Old:"
List of 3
 $ data.file    : chr "test.txt"
 $ file.name    : chr "data/test.txt"
 $ variable.name: chr "test_var"
[1] "Old2:"
List of 3
 $ data.file.2    : chr "test.txt"
 $ file.name.2    : chr "data/test.txt"
 $ variable.name.2: chr "test_var"

Ok, so after adding some explanation, the proof of concept grew a little bigger. The main point of adding two old functions is to show that references to the functions are correctly stored in the closures.

Cool! And the idea is that we flag it as deprecated for now. We should also suggest that the user contact the reader maintainer to fix the interface.