sneumann/xcms

Add a function that calculates peak quality metrics for detected peaks

Opened this issue · 10 comments

Recently (PR #685), new quality score metrics can be calculated during centWave peak detection. Would be good to have also a function that allows calculation of these scores on already detected chrom peaks (i.e. after peak detection) or also directly on EICs.

While straight forward to implement, naming is again an issue. @wkumler do you have a suggestion/appropriate name for your new peak quality metrics we could use? Using chromPeaksQuality as function name might be a little too generic maybe.

I don't have strong opinions on it, honestly! I agree that chromPeaksQuality is too ambitious unless this is a spot we want to allow others to calculate additional metrics from the raw data and the function is expected to grow significantly. I do think the "beta" nomenclature I've been using is more for internal use and don't believe the average user needs to know that it's being fit to a beta distribution. It does still fit to an "idealized" peak so maybe something like idealPeakComparison or simplePeakTest could be descriptive. It can also be used to replace the existing sn and egauss metrics so maybe snWithinPlusPeakCor could also be helpful but is a little dense. If I had to pick one on the spot I'd probably go with something like peakShapeQualityCalc because the metrics were designed to measure peak "shapeliness".

Agree - and I like your suggested name - maybe slightly reformulated into chromPeakshapeQuality? To clarify that this is calculated on chromPeaks (with defined rtmin rtmax and calculating the peak shape quality of the signal of the chromPeak)?

I like it! Sounds good to me.

Alignment with the mzQC folks might be nice. https://github.com/HUPO-PSI/mzQC
What about a rather generic peakQuality function, and parameters that specify what is calculated, i.e. beta, egauss, ...
Yours, Steffen

that's obviously the better approach - maybe have a generic chromPeakQuality method and again our infamous Param parameter classes to define which quality metric to return. haven't found (well just had a quick look) a metric in mzQC that would fit the one defined by @wkumler .

A generic function that returns the metric of choice would be great. I currently have William's function qscoreCalculator implemented in my script for targeted data analysis, but it is still super barebones and extracts targeted rt and int data in a loop, so I need to vectorize and improve my code still...

snippet (data is an MsExperiment object):

chromatograms <- chromatogram(data, rt = rtRanges[j, ], mz = mzRanges[j, ])
        
rt <- chromatograms@.Data[[i]]@rtime
       
int <- chromatograms@.Data[[i]]@intensity

To avoid adding too many functions (also thinking of the future) maybe good to add a chromPeakSummary method. This method should calculate a summary for each chrom peak. A param parameter would then allow to define which summary should be calculated. Examples could be:

  • chromPeakSummary(xmse, BasicStats()): calculate basic summary statistics for each peak, with the number of data points, the min, max, median and mean intensity. Maybe even something like variation of m/z values.
  • chromPeakSummary(xmse, PeakShapeQuality()): to calculate @wkumler 's scores.
  • ... other summaries, e.g. as defined by mzQC as @sneumann suggests. @tnaake do you by chance know any chromatographic peak related quality metrics defined in mzQC?

similar to all other chromPeak... methods we can have a parameter peaks that allows to provide the IDs of chrom peaks if the metric should only be calculated for selected chrom peaks.

Hi @jorainer

if I understand correctly what you want to do then there are several metrics defined by the PSI working groups. Have a look
e.g. at QC:4000074, QC:4000075, QC:4000076 in QC-cv.obo, or MS:4000050, MS:4000051 in PSI-MS.obo.

Had a look through the obo. The only actual quality metric of an EIC (or XIC as they are called in the obo) is the FWHM (full width at half maximum, MS:1000086). The obo related obo terms are MS:4000017 (chromatogram metric) or more specific MS:4000018 (XIC quality metric).

Yeah, we struggled to find a lot of standardized "peak quality" definitions in the literature when working on the original project as well. The Kantz 2019 paper uses six quality metrics and a bunch of combinations of them (peak duration, height, area, FWHM, tailing factor, and asymmetry factor). Your 2022 CPC paper @jorainer has some of these implemented already (looks like everything except asymmetry factor, though the noise estimation is likely different). We used the outputs from XCMS (mz, rt, peakwidth, area, sn, f, scale, lmin) but didn't test on the additional metrics of verboseColumns. I do think it's worth calculating an m/z deviation (and maybe an m/z deviation from mean m/z ~ intensity) metric even though that didn't show up especially strongly in my dataset, and I also think that a metric for the "number of missing scans" would be really nice to have, though again my custom implementation wasn't especially powerful in my dataset.