swharden/FftSharp

what is the relationship between FFT size and frequency?

jtorjo opened this issue · 4 comments

Hi Swharden,

I just found your project, and I believe is beyond awesome!

I've used naudio before, quite a bit, and I kinda' got pretty comfortable with it. Right now I'm building a sound analyzer, mainly to discover bass in songs.

I've read quite a bit on the internet, and there's something I can't wrap my head around:

  • as long as you choose a power of two, you compute an array of frequencies in Hz. Even if you choose 512, or 1024, you still get the array of frequencies, probably a bit more accurate for 1024, but roughly the same nonetheless. Am I correct?
  • Assuming I go with 1024, do I need to read at 1024 boundaries (multiples of 1024) for my results to be correct? What I mean by this: say I have a 2 minute song, and I want to read the bass only from 24.450s to 29.245s -> can I just read that part only, and compute the Hz frequencies for each block of 1024 floats?

Best,
John

I just found your project, and I believe is beyond awesome!

I'm happy you're enjoying it!

As long as you choose a power of two, you compute an array of frequencies in Hz. Even if you choose 512, or 1024, you still get the array of frequencies, probably a bit more accurate for 1024, but roughly the same nonetheless. Am I correct?

More or less. I'll restate this using some key terms. FFT can report power of frequencies between 0 and half your sample rate (the Nyquist frequency). The more points there are in your FFT the more frequency resolution you will have, but the span of frequencies will always be between 0 and Nyquist. The FFT has a requirement that the points in the FFT (the FFT size) must be a power of 2.

I'll assume your audio is sampled at 48kHz for simplicity.
To calculate bass from audio between seconds 24.450 to 29.245:

  • isolate the audio you wish to analyze as a double[] array. To do this determine the indexes of the source audio to include:
    • 24.450 * 48000 = 1173600
    • 24.245 * 48000 = 1163760
    • your array will have 1163760 - 1173600 = 9840 points
    • you need to ensure this is a power of 2. Perhaps trim some off to only collect 8192 points (2^13)
  • window your array as described on the front page of this repo. Hanning is most common for audio.
  • take the FFT of the array using FFTmagnitude() or FFTpower()
    • The output will be an array of 4096 points (half the FFT size)
    • points represent power of frequencies from 0 to 24kHz (the Nyquist frequency)
    • each point represents 24000 / 4096 = 5.86 Hz
  • Measure the power of bass
    • Let's define bass as <100 Hz
    • The 17th element of the FFT output corresponds to 5.86 * 17 = 99.62 Hz
    • The total base is therefore the sum of the first 17 points in that FFT array

I hope this helps!

@swharden Thanks for the clarification!

I've pretty much done all of the above, what I'm still in the dark about: I still don't fully understand the number of points I should be using, to do the FFT transform.

For my app, I'm only interested in 48KHz and 44.1KHz. But for now, lets go with 48KHz.

I already have the right IEEE floats gotten from naudio

24.450 * 48000 = 1173600
29.245 * 48000 = 1403760
My array = 230160 elements

So if I was doing FFT every 1024 points, I would get 224 datapoints (each datapoint would contain all frequencies).

So, basically this is what I don't understand - is my assumption correct? Can I just have 224 datapoints when doing FFT for each 1024 point?

My code is roughly like this:

// this returns all the frequencies at all time points
public analyze_details analyze_details() {
	buffer_.float_position = 0;
	var float_buffer = new float[fft_sample_size * buffer_.format.Channels];
	var double_buffer = new double[fft_sample_size];
	var channel_count = buffer_.format.Channels;
	var window = algorithm_window();

	analyze_details ad = new analyze_details();
	List<analyze_details.frequency_values> values = new List<analyze_details.frequency_values>();
	while ( true) {
		var ms = audio_util.stream_position_to_ms(buffer_.float_position, buffer_.format);
		var read_count = buffer_.read_float(float_buffer, 0, fft_sample_size * buffer_.format.Channels);
		if (read_count <= 0)
			break;
		for (int i = 0; i < read_count / channel_count; i++)
			// note: taking only first channel , for now
			double_buffer[i] = float_buffer[i * channel_count];
		for (int i = read_count / channel_count; i < fft_sample_size; ++i)
			double_buffer[i] = 0; // pad zeros if needed

		var transformed = window != null ? Window.Apply(window, double_buffer) : double_buffer;
		double[] frequency_values = null;
		switch (unit) {
		case unit_type.rms_unit:
			frequency_values = Transform.FFTmagnitude(transformed);
			break;
		case unit_type.db_unit:
			frequency_values = Transform.FFTpower(transformed);
			break;
		}

		if (ad.frequencies == null ) 
			ad.frequencies = Transform.FFTfreq(buffer_.format.SampleRate, frequency_values.Length);

		values.Add(new analyze_details.frequency_values {
			time_ms = ms, values = frequency_values,
		});
	}

	ad.values = values;
	return ad;
}

So, basically this is what I don't understand - is my assumption correct? Can I just have 224 datapoints when doing FFT for each 1024 point?

Correct, you can repeatedly call FFT on little "windows" of data. In your case each window has 1024 points and spans 43 ms.(44100/1024)

Note that a repeated series of FFTs can create a spectrogram, and I have another library devoted to making those which you may find useful https://github.com/swharden/Spectrogram and with it you can run code like:

int sampleRate = 44100;
int fftSize = 1024;
var spec = new Spectrogram(sampleRate, fftSize);

// every time new audio is available analyze it
double[] newAudio = /* new values from microphone */
spec.Add(audio);

// FFTs are calculated automatically
List<double[]> values = spec.GetFFTs();

...and the values it produces are probably similar to what your code example does

Hope it helps!

@swharden Thanks!
I've seen you Spectogram project, basically that's how I found out your lib here ;)

Yeah, come to think of it, pretty sure your Spectogram does pretty close to what I'm doing. Thanks again!