Add a filterUniquePrecursorMz method
jorainer opened this issue · 4 comments
Add a filtering function filterUniquePrecursorMz
that reduces a Spectra
object selecting for groups of spectra with similar precursor m/z (given ppm
and tolerance
) the one with the highest precursor intensity.
Could you elaborate on the difference with filterPrecursorMz()
. I understand that with this new method, you in addition want the one with the highest precursor intensity... but can't this be done in another way, or about a function that does that (say filterPrecursorMaxIntensity()
).
In other words, we have a function that does something similar... couldn't we offer a generic tools to use in combination of what we already have?
filterPrecursorMzValues
and filterPrecursorMzRange
take target m/z values to filter (the first allows to keep spectra with matching precursor m/z, the second spectra within a specified m/z range.
What I would like to have here is: assuming that fragment spectra were generated from the same ions (thus the spectra have similar precursor m/z values), select for each set the fragment spectrum with the highest precursor intensity. So, basically, I don't know the precursor m/z beforehand (like I would with the filterPrecursorMzValues
) but want to reduce the Spectra
to unique spectra/precursor m/z.
so, the definition of the method would be:
setMethod("filterPrecursorMaxIntensity", "Spectra", function(object, ppm = 10, tolerance = 0))
and the function internally would use the MsCoreUtils::group
function to group the spectra based on their precursor m/z values (given ppm
and tolerance
) and for each group return the one spectrum with the highest precursor intensity.
I hope I explained it well enough...
Ok, thank. Now that I realise that you don't have the precursor MZ values beforehand, it makes more sense. Please, go ahead :-)
Small benchmark of two implementations of the function:
filterPrecursorMaxIntensity <- function(x, tolerance = 0, ppm = 10) {
pmz <- precursorMz(x)
pmi <- precursorIntensity(x)
idx <- order(pmz, na.last = NA)
mz_grps <- group(pmz[idx], tolerance = tolerance, ppm = ppm)
if (any(duplicated(mz_grps))) {
keep <- is.na(pmz)
keep[vapply(split(idx, as.factor(mz_grps)),
function(z) {
if (length(z) == 1L) z
else z[which.max(pmi[z])]
}, integer(1L),
USE.NAMES = FALSE)] <- TRUE
x <- x[keep]
}
x
}
filterUniquePrecursorMz <- function (querySpectra,minSize=3)
{
## pg_filt = querySpectra[which (sapply (mz (querySpectra),length) >= minSize)]
pg_filt <- querySpectra
## Order by precusrsor mass
pg_filt = pg_filt[order (precursorMz (pg_filt))]
## Bin by precirsor mass and remove bins duplicated precursor ions (keep higher intensity)
pg_mat = cbind ("mz" = precursorMz (pg_filt),"intensity" = precursorIntensity (pg_filt))
mz_grps = MsCoreUtils::group (pg_mat[,"mz"], tolerance=0, ppm=10)
if (any (duplicated (mz_grps))) {
no_dup = tapply (1:length (mz_grps),mz_grps,function (i) {
if (length (i) == 1) { return (i) }
else { return (i[which.max (pg_mat[i,"intensity"])]) }
})
pg_filt = pg_filt[no_dup]
}
return(pg_filt)
}
The second implementation does in addition also an internal filter on the number of fragment peaks. I would however suggest (also for performance issues and because it is cleaner for the user) to do that outside of the function. This can be easily done with e.g. sps <- sps[lengths(sps) > 3]
. Also, the second version does silently re-order the spectra which can be confusing for the user (he/she might not expect that) and works only for Spectra
with only MS2 spectra.
library(Spectra)
library(microbenchmark)
fl <- system.file("TripleTOF-SWATH", "PestMix1_DDA.mzML",
package = "msdata")
sps_dda <- Spectra(fl)
## filter to MS2 data
tmp <- filterMsLevel(sps_dda, 2L)
microbenchmark(filterPrecursorMaxIntensity(tmp), filterUniquePrecursorMz(tmp))
Unit: milliseconds
expr min lq mean median uq
filterPrecursorMaxIntensity(tmp) 3.174594 3.261636 3.589510 3.408911 3.519393
filterUniquePrecursorMz(tmp) 4.685477 4.848344 5.230798 5.089792 5.221355
max neval cld
15.88249 100 a
12.24396 100 b