Pitch.FromYin triggers exception for certain sample rates and pitch ranges
Opened this issue · 4 comments
The logic in NWaves.Features.Pitch.FromYin
triggers IndexOutOfRangeException
if the range of scanned samples isn't large enough for the sample rate and desired pitch range.
For example, given
NWaves.Features.Pitch.FromYin(samples, sampleRate, start, end, 40, 700);
with sampleRate
== 22050, and start
and end
only 256 samples apart, the access to the cmdf
array in
NWaves/NWaves/Features/Pitch.cs
Line 245 in c1a6a4a
pitch1
is 551
, much larger than cmdf
which is only length
(half the frame size, which is 128 here). In hindsight, of course this frame size is too small to detect the desired low pitch, but I'd expect the function to account for this.
Note that the user's choice of pitch range isn't the whole issue. Calling this method at the end of a non-frame-size-aligned array, e.g. 512 ... 533, would have an arbitrarily small length
. Likely the cmdf
array needs to be sized based on the pitch range, not the length of the sample range?
Correct, this is one of those functions in this lib, whose code is more like a cheat sheet for a particular algorithm. It's not optimized for usage in feature extractors as well atm, and it's assumed that we call it with the parameters that make sense logically (like you mentioned).
But of course you can adapt it and use it quite optimally (like here). As for the possible solution - I wouldn't size the cmdf
array based on the pitch range (because it would be a sort of a technical crutch/hack just to get rid of the exception, whereas logically this array should be computed from the length of samples). I'd simply add the check for samples length being sufficient to analyze the given pitch range and, if the check fails, either return 0 (to account for the case with the non-frame-size-aligned array you mentioned), or throw an exception (in this case use zero-padding of the last frame of the input signal).
Thanks for replying quickly. I see your points.
I think I ran into this because I was "unrolling" some uses of multiple passes for different purposes and didn't realize that the TimeDomainFeaturesExtractor was clipping off the non-frame-aligned last frame in ComputeFrom
. (Pitch.FromYin
seemed to be flexible in that it allowed endPos = -1
to be promoted to the length of the samples array, so it implied any length of array would work.)
if the check fails, either return 0 (to account for the case with the non-frame-size-aligned array you mentioned)
Since Pitch.FromYin
is used as a motivating example from PitchExtractor and in https://github.com/ar1st0crat/NWaves/wiki/Feature-extractors#pitch-extractor, I might argue this method would benefit from the return of 0 versus an exception, since none of the other extractors suffer from this relation between the frame size and pitch ranges, AFAICT.
Yep. FYI, the example in docs is just the simplest example, and I wouldn't recommend to use it in frame-by-frame scenarios, because it reallocates memory every time the new frame is processed. If you want more performant pitch extractor based on Yin algorithm, create a FeatureExtractor
subclass and copy the procedural part of Pitch.FromYin()
to ProcessFrame()
method something like:
public class PitchYinExtractor : FeatureExtractor
{
public override List<string> FeatureDescriptions { get; }
protected readonly float _low;
protected readonly float _high;
protected readonly float cmdfThreshold;
protected int _pitch1, _pitch2;
protected int _length;
protected readonly float[] _cmdf;
/// <summary>
/// All assertions are here, checked one time, during construction
/// </summary>
public PitchYinExtractor(PitchYinOptions options) : base(options)
{
_low = (float)options.LowFrequency;
_high = (float)options.HighFrequency;
_cmdfThreshold = options.CmdfThreshold; // introduce new Options class with necessary settings
_pitch1 = (int)(1.0 * SamplingRate / _high); // 2,5 ms = 400Hz
_pitch2 = (int)(1.0 * SamplingRate / _low); // 12,5 ms = 80Hz
_length = FrameSize / 2;
if (..) // check _pitch1, _pitch2, _length
throw ...
_cmdf = new float[_length];
FeatureCount = 1;
FeatureDescriptions = new List<string>() { "pitch" };
}
/// <summary>
/// Computes pitch in one frame.
/// </summary>
/// <param name="block">Block of data</param>
/// <param name="features">Pitch (feature vector containing only pitch) computed in the block</param>
public override void ProcessFrame(float[] block, float[] features)
{
Array.Clear(_cmdf, 0, _cmdf.Length);
for (var i = 0; i < _length; i++)
{
for (var j = 0; j < _length; j++)
{
var diff = block[j + startPos] - block[i + j + startPos];
_cmdf[i] += diff * diff;
}
}
_cmdf[0] = 1;
...
...
}
Got it, thanks! (tbh, I already moved to the autocorrelation method ;)