adamstark/Gist

Getting Audio Frames

LadyJuse opened this issue · 15 comments

Just a simple question, but where would I get the audio frames for the analysis? Nothing I can look up seems to be precise or just gives me items that look incompatible with the code.

What kind of audio analysis are you trying to do? The answer to this depends a little on whether you are trying to process an audio file or whether you want to do this in real-time (e.g. in a plug-in)

I want to process it so I can use the data to make custom levels for a space shooter

So real-time audio gets converted to data via the Gist library and then that is used in the game?

If it is a real-time application, then it will depend on your environment and project as to how audio is handled, but most likely there will be some callback function somewhere that will provide audio frames on a regular basis (e.g. in chunks of 128 audio samples). What framework are you using?

Sorry that I was unclear. The audio file will be read and converted. I use the SDL_Mixer Libary to play the music if that's of importance for this.

Have you decided how you will be reading the audio file? And will the audio file be mono or stereo?

Once i know those things I can suggest a solution :)

It is in stereo.
If you mean the audio file's data, the stuff I have found which I am not sure gets me the info uses fstream to get file data currently.

Ok, so this is where it is slightly tricky because when you have stereo audio files they can be represented in different ways - the left and right channels can be in different arrays, or they can be 'interleaved' with audio samples in the same array, alternating left sample and then right sample and so on.

I wrote an audio file library (https://github.com/adamstark/AudioFile) so I'll post how I would do it with that. In that library, the audio channels are in separate arrays.

#include "AudioFile.h"

// then, somewhere later in your code wherever is relevant...

const int audioFrameSize = 512;
const int sampleRate = 44100;

// create one Gist object for each channel
Gist<double> gistLeft (audioFrameSize, sampleRate);
Gist<double> gistRight (audioFrameSize, sampleRate);

// AudioFile object for reading audio files
AudioFile<double> audioFile;
audioFile.load ("/path/to/your/audiofile.wav");

// create buffers for our audio frames
std::vector<double> audioFrameLeftChannel (audioFrameSize);
std::vector<double> audioFrameRightChannel (audioFrameSize);

// loop over all audio samples, in hops of the audio frame size
for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)
{
    // fill the audio frames
    for (int k = 0; k < audioFrameSize; k++)
    {
        audioFrameLeftChannel[k] = audioFile.samples[0][i + k];
        audioFrameRightChannel[k] = audioFile.samples[1][i + k];
    }

    // process left channel
    gistLeft.processAudioFrame (audioFrameLeftChannel);
    float zcrLeft = gistLeft.zeroCrossingRate();
    
    // process right channel
    gistRight.processAudioFrame (audioFrameRightChannel);
    float zcrRight = gistRight.zeroCrossingRate();
}

I hope this helps, let me know how you get on :)

Thanks! I'll let you know how it goes!

I've been at it, adding in the parts one at a time and when I added in the contents for the double for loop, I run into a debug error regarding the vector, when i is 8923136 and k is 256, it says that the vector subscript is out of range.

Are you using my audio file library? Or a different one?

I think maybe try changing...

for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)

to

for (int i = 0; i < (audioFile.getNumSamplesPerChannel() - audioFrameSize); i += audioFrameSize)

as we're probably running just over the end of the audio buffer

This code is very helpful, but I have a question. The Gist object is created for store the data of each frame or all frame? I saw the mag spec of Gist object is a 1d vector, so I use std:: vector<Gist<double>> to store the mag spec of the whole audio. Is this the right way to use Gist?

That's not quite right, no. So each Gist object is there to process a series of audio frames. You can get a 1D magnitude spectrum out of the Gist object, but then it is up to you if you want to store each magnitude spectrum somewhere. So you might create a vector of vectors to do that.

To summarise - you should have one Gist object per audio channel you want to process :)

The change worked, thanks!

Great - glad to hear that :)


for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)
{
    for (int k = 0; k < audioFrameSize; k++)
    {
        audioFrameLeftChannel[k] = audioFile.samples[0][i + k];
    }
    // process left channel
    gistLeft.processAudioFrame (audioFrameLeftChannel);
    float zcrLeft = gistLeft.zeroCrossingRate();
}

@adamstark As I see Gist does not handle frame overlaps right? So in fourier transform perspective, lets say while getting magnitude spectrum, it is our responsibility to take care of overlaps during chunk preparation, correct?