rformassspectrometry/Spectra

Add fusePeaks function

jorainer opened this issue · 8 comments

Add a function that allows to fuse/combine (aggregate?) peaks within each spectrum. The need (usability) of such a function comes from rformassspectrometry/MsCoreUtils#116 . Basically, the idea is to fuse all peaks with an m/z smaller than a threshold into a single peak by combining/fusing their intensities and m/z values.

This is in contrast to the reducePeaks that reports for each peak group only a single (representative) peak. This function is supposed to combine the signals of the peaks within the peak group.

A definition could be:

fusePeaks <- function(x, ppm = 20, tolerance = 10, intensityFun = max, mzFun = mean, weighted = TRUE)

intensityFun allows to specify the function to combine the intensity values. mzFun how m/z values should be fused into a single value. If weighted = TRUE an intensity weighted mean will be used, regardless of mzFun (which would only be considered if weighted = FALSE.

Possible names would be:

  • combinePeaks (maybe analogously to combineSpectra?)
  • fusePeaks (because it fuses/melts the peaks into a single one combining their values)

I personally prefer combinePeaks() (definitely not combineSpectra(), for the same reason you mention in #291, that the function operates on peaks), but could live with fusePeaks().

Unfortunately we have already a combinePeaks function implemented. That one is called by combineSpectra and its purpose is to combine peaks from different spectra, e.g. 4 spectra (and their peak matrices) are supposed to be combined into a single spectrum (and peak matrix).

We can of course have an additional combinePeaks,Spectra method, that does something slightly different (i.e. combine peaks within each spectrum), but we need to clearly document these differences.

So may be the existing one should rather be called combineSpectra() - this would be a confusing change, but would lead to the cleanest option.

Sorry, I was not explaining well:

combineSpectra is a function to combine subsets of a Spectra into single spectra (parameter f defines which spectra in Spectra should be combined). The function has a parameter FUN that allows to pass a function to merge the peak matrices of the sets of spectra. The default is FUN = combinePeaks. This combinePeaks function takes a list of peak matrices and combines them into a single peak matrix.

We could now have a combinePeaks,list method that uses the original code (for the use case above) and a combinePeaks,Spectra for the new use case (i.e. merge peaks within each spectra). In theory, we can re-use part of the code, since we're now in essence combining peaks on a single peak matrix. So the combinePeaks,list would be the special case.

Before thinking of an implementation - just checking if the following definition would not be too confusing for the user:

  • combineSpectra,Spectra (already implemented); definition: function(x, f = x$dataStorage, p = x$dataStorage, FUN = combinePeaks, ...): combines sets of spectra in x into a single spectrum. The peak data for each set of spectra is combined using FUN into a single peak data matrix.
  • combinePeaks (already implemented); definition: function(x, intensityFun = mean, mzFun = mean, weighted = FALSE, tolerance = 0, ppm = 0, timeDomain = FALSE, peaks = c("union", "intersect"), main = integer(), minProp = 0.5): combines a list of peak matrices into a single peak matrix. The main purpose is to combine peaks from different spectra (but it will eventually also group peaks within a spectrum depending on ppm and tolerance).
  • combinePeaks,Spectra (to be implemented); definition: function(x, intensityFun = mean, mzFun = mean, weighted = FALSE, tolerance = 0, ppm = 0, timeDomain = FALSE): combines peaks for each spectrum in x.

We could also think of an alternative name of the function if using the same name (combinePeaks) is confusing. Maybe combinePeaksPerSpectrum or combinePeaksInSpectrum. Happy for feedback @lgatto ...

Or - like you suggest, we rename the old combinePeaks into e.g. combinePeaksData and use combinePeaks for the new functionaltity (making it a method and forwarding combinePeaks,list to combinePeaksData. That should be the cleanest solution since peaksData is in fact a list of peak matrices, so combinePeaksData would make more sense for the old functionality anyway.

👍🏻

Implemented.