ethanbass/chromatographR

find_peaks returns error message (argument "namevec" is missing, with no default)

Closed this issue · 7 comments

Hi Ethan,

I found that the find_peaks() function returns an error message "argument "namevec" is missing with no default".
The find_peaks function isn't looking for a namevec argument itself, so it seems that the problem is in the deriv() function within find_peaks.

I called find_peaks() with a non-default smoothing argument:

> chromatographR::find_peaks(fid_df$value, smooth_type="raw")
Error in deriv.default(y) : 
  argument "namevec" is missing, with no default

It looks like find_peaks() is calling deriv() when the smoothing argument is anything other than the default 'gaussian', and deriv is expecting a function as an argument by default, not a vector of y coordinates.

As an aside, when I call find_peaks() with all default arguments, it returns a list of over 5000 rows using a random GC chromatogram to test. A little bit of tweaking (chromatographR::find_peaks(fid_df$value,smooth_window=1, smooth_width=0.1, slope_thresh = 0.1, amp_thresh = 2)) brought that down to just 19. Is there any further documentation as to exactly how those arguments influence the peak detection?

Hi Phenomniverse,
I think the first issue you mention appears to be a bug that I have overlooked because I always use the smoothing option. I will definitely look into this. I think it's supposed to call the diff function rather than deriv.

Regarding the second part, I will try to write a more detailed response later, but to get the ball rolling, I suspect that your signal isn't sufficiently smooth and your peaks are getting split (into multiple peaks). The time resolution of the GC data from the machine is often unnecessarily high and can lead to issues like this. I would suggest using the preprocess function to reduce the time resolution on your chromatograms by interpolation (if you haven't already done this). This will make the peak finding and integration much faster and also more accurate. E.g.

t1 <- 4 # your new start time, it's often good to cut off the solvent peak here
t2 <- 59.9 # the end of your chromatogram or wherever you want to cut it
res <- .005 # time resolution (in minutes). I suggest starting with .005
new.ts <- seq(t1, t2, res)
dat.pr <- preprocess(preprocess(x, spec.smooth = F, dim1 = new.ts, parallel = FALSE)

(you would then run the peak finding function on the processed chromatogram stored in dat.pr or whatever you want to save it as).

You should be able to accomplish something similar in the find_peaks function by adjusting the smoothing parameters (smooth_window & smooth_width) but at the cost of much higher computational time. If I am correct that the problem is one of "peak splitting", adjusting the thresholds isn't likely to be that helpful, because the problem lies with the algorithm detecting multiple (often many) tiny peaks within each peak, which are usually just noise.

Also, you are right that this function should be better documented. I will add it to my list!

I think I will add more smoothing options as well. I can't really remember anymore why I used the gaussian smoother, but I'm not so sure it is the best option. As you increase the smoothing width it becomes very slow, so it definitely doesn't seem to be a great option for smoothing very high resolution signals.

Is the principle here that you locate peaks on a smoothed version of the data and then apply those peak locations back to the higher resolution data?
I want to be able to reproduce the the type of integration results I get from manual processing in chemstation, and some of the higher resolution data will be important, for example minor inflection points in peaks might indicate coeluting compounds, which is something that you can pick up on visually (sometimes) if you look closely at a peak sometimes at various different scalings. I'm wondering if an iterative approach to peak detection at various levels of smoothing might model that visual approach?

Yes, that is the general idea. If you don't do any smoothing, the algorithm gets caught up on all the local minima and maxima and ends up splitting the peaks into tiny segments. The general approach is also described on Tom O'Havers webpage (https://terpconnect.umd.edu/~toh/spectrum/Differentiation.html) under the "Peak detection" heading.

I am starting to realize that the gaussian smoother I use is maybe not very good for very high resolution data like a lot of the FID data. I have gotten around this usually by just reducing the time resolution, but I think using a better smoother would probably be a good alternative as well. I am looking into some possibilities.

I think your idea of the iterative approach is interesting. I think it may be something like what they propose here (https://www.sciencedirect.com/science/article/pii/S0021967316305945?via%3Dihub). I'll have to think on it a bit more. Maybe I can implement something like this as part of the package.

I fixed the namevec error and also added some more smoothing options (9c774ee). Beefed up the documentation a little bit but probably it could still use some work.

Thanks Ethan, those are good resources you've referred me to. I'll try to absorb them over the weekend.

Fixed by 9c774ee.