ar1st0crat/NWaves

Pitch.FromYin triggers exception for certain sample rates and pitch ranges

Opened this issue · 4 comments

The logic in NWaves.Features.Pitch.FromYin triggers IndexOutOfRangeException if the range of scanned samples isn't large enough for the sample rate and desired pitch range.

For example, given

NWaves.Features.Pitch.FromYin(samples, sampleRate, start, end, 40, 700);

with sampleRate == 22050, and start and end only 256 samples apart, the access to the cmdf array in

while (t + 1 < pitch2 && cmdf[t + 1] < cmdf[t]) t++;
will overflow the bounds for some inputs since pitch1 is 551, much larger than cmdf which is only length (half the frame size, which is 128 here). In hindsight, of course this frame size is too small to detect the desired low pitch, but I'd expect the function to account for this.

Note that the user's choice of pitch range isn't the whole issue. Calling this method at the end of a non-frame-size-aligned array, e.g. 512 ... 533, would have an arbitrarily small length. Likely the cmdf array needs to be sized based on the pitch range, not the length of the sample range?

Correct, this is one of those functions in this lib, whose code is more like a cheat sheet for a particular algorithm. It's not optimized for usage in feature extractors as well atm, and it's assumed that we call it with the parameters that make sense logically (like you mentioned).
But of course you can adapt it and use it quite optimally (like here). As for the possible solution - I wouldn't size the cmdf array based on the pitch range (because it would be a sort of a technical crutch/hack just to get rid of the exception, whereas logically this array should be computed from the length of samples). I'd simply add the check for samples length being sufficient to analyze the given pitch range and, if the check fails, either return 0 (to account for the case with the non-frame-size-aligned array you mentioned), or throw an exception (in this case use zero-padding of the last frame of the input signal).

Thanks for replying quickly. I see your points.

I think I ran into this because I was "unrolling" some uses of multiple passes for different purposes and didn't realize that the TimeDomainFeaturesExtractor was clipping off the non-frame-aligned last frame in ComputeFrom. (Pitch.FromYin seemed to be flexible in that it allowed endPos = -1 to be promoted to the length of the samples array, so it implied any length of array would work.)

if the check fails, either return 0 (to account for the case with the non-frame-size-aligned array you mentioned)

Since Pitch.FromYin is used as a motivating example from PitchExtractor and in https://github.com/ar1st0crat/NWaves/wiki/Feature-extractors#pitch-extractor, I might argue this method would benefit from the return of 0 versus an exception, since none of the other extractors suffer from this relation between the frame size and pitch ranges, AFAICT.

Yep. FYI, the example in docs is just the simplest example, and I wouldn't recommend to use it in frame-by-frame scenarios, because it reallocates memory every time the new frame is processed. If you want more performant pitch extractor based on Yin algorithm, create a FeatureExtractor subclass and copy the procedural part of Pitch.FromYin() to ProcessFrame() method something like:

public class PitchYinExtractor : FeatureExtractor
    {
        public override List<string> FeatureDescriptions { get; }
        protected readonly float _low;
        protected readonly float _high;
        protected readonly float cmdfThreshold;
        protected int _pitch1, _pitch2;
        protected int _length;

        protected readonly float[] _cmdf;

        /// <summary>
        /// All assertions are here, checked one time, during construction
        /// </summary>
        public PitchYinExtractor(PitchYinOptions options) : base(options)
        {
            _low = (float)options.LowFrequency;
            _high = (float)options.HighFrequency;
            _cmdfThreshold = options.CmdfThreshold;  // introduce new Options class with necessary settings

            _pitch1 = (int)(1.0 * SamplingRate / _high);    // 2,5 ms = 400Hz
            _pitch2 = (int)(1.0 * SamplingRate / _low);     // 12,5 ms = 80Hz

            _length = FrameSize / 2;
            
            if (..) // check _pitch1, _pitch2, _length
                throw ...

            _cmdf = new float[_length];

            FeatureCount = 1;
            FeatureDescriptions = new List<string>() { "pitch" };
        }

        /// <summary>
        /// Computes pitch in one frame.
        /// </summary>
        /// <param name="block">Block of data</param>
        /// <param name="features">Pitch (feature vector containing only pitch) computed in the block</param>
        public override void ProcessFrame(float[] block, float[] features)
        {
            Array.Clear(_cmdf, 0, _cmdf.Length);

            for (var i = 0; i < _length; i++)
            {
                for (var j = 0; j < _length; j++)
                {
                    var diff = block[j + startPos] - block[i + j + startPos];
                    _cmdf[i] += diff * diff;
                }
            }

            _cmdf[0] = 1;

             ...
             ...
        }

Got it, thanks! (tbh, I already moved to the autocorrelation method ;)