spectral-cockpit/opusreader2

🧱 Helper to extract final spectra as matrix

Opened this issue · 4 comments

as discussed #27 , now in a separate issue.

Thanks for the dedicated issue.

In terms of user interface, I think this option should essentially be a switch between a list (with full metadata, which is currently returned to the user) and a matrix.

The rationale is:

  • a chemometrician wanting to pull spectra to either calibrate a model or generate predictions would call read_opus with data_only = TRUE so a matrix is returned and can be pre-processed directly.
  • a more advanced user might turn data_only to FALSE for more advanced checks on the data, and a more thorough look at the metadata -- and could of course use a lapply call on that list to combine whatever piece of the data is interesting.

Finally, I'd suggest to change the argument name, and use a simpler and more common matrix = TRUE|FALSE.

Example call:

# Select all OPUS files from a range of folders
opus_fns <- list.files("some/project/folder/", pattern = glob2rx("my_project-*.0"), full.names = TRUE)

# Directly read and assemble MIR matrix from the selected files
mir_mat <- read_opus(
  opus_fns,
  matrix = TRUE,
  progress = TRUE
)

# Quick plot
matplot(t(mir_mat))

Thanks for the dedicated issue.

In terms of user interface, I think this option should essentially be a switch between a list (with full metadata, which is currently returned to the user) and a matrix.

The rationale is:

* a chemometrician wanting to pull spectra to either calibrate a model or generate predictions would call `read_opus` with `data_only = TRUE` so a matrix is returned and can be pre-processed directly.

* a **more advanced user** might turn `data_only` to `FALSE` for more advanced checks on the data, and a more thorough look at the metadata -- and could of course use a `lapply` call on that list to combine whatever piece of the data is interesting.

Finally, I'd suggest to change the argument name, and use a simpler and more common matrix = TRUE|FALSE.

Example call:

# Select all OPUS files from a range of folders
opus_fns <- list.files("some/project/folder/", pattern = glob2rx("my_project-*.0"), full.names = TRUE)

# Directly read and assemble MIR matrix from the selected files
mir_mat <- read_opus(
  opus_fns,
  matrix = TRUE,
  progress = TRUE
)

# Quick plot
matplot(t(mir_mat))

Thanks for the nice summaries of use cases. I would not rename to matrix, because it hides the intent what the function does. Yes, it does return a matrix, but it is not clear that this switch is for all parameters and data vs. final spectra only. @pierreroudier @ThomasKnecht I would suggest to either make matrix_spectra or matrix_spec in case we moved away from data_only

Because it is quite an important argument for controlling read_opus() behavior and user experience, these two steps seem important:

  1. make it very clear in the argument name what the output is ( @pierreroudier above), and that only (final) spectra are returned vs. all data and parameters.
  2. document this argument really concisely.

currently it is possible to only parse the data. the combination to a matrix should in my opinion be made in an extra function.