Getting Audio Frames

Question

Getting Audio Frames

LadyJuse opened this issue 4 years ago · 15 comments

Just a simple question, but where would I get the audio frames for the analysis? Nothing I can look up seems to be precise or just gives me items that look incompatible with the code.

Answer 1 · 2020-03-25T17:44:52.000Z

What kind of audio analysis are you trying to do? The answer to this depends a little on whether you are trying to process an audio file or whether you want to do this in real-time (e.g. in a plug-in)

Answer 2 · 2020-03-25T17:54:33.000Z

I want to process it so I can use the data to make custom levels for a space shooter

Answer 3 · 2020-03-26T23:53:13.000Z

So real-time audio gets converted to data via the Gist library and then that is used in the game?

If it is a real-time application, then it will depend on your environment and project as to how audio is handled, but most likely there will be some callback function somewhere that will provide audio frames on a regular basis (e.g. in chunks of 128 audio samples). What framework are you using?

Answer 4 · 2020-03-27T00:03:03.000Z

Sorry that I was unclear. The audio file will be read and converted. I use the SDL_Mixer Libary to play the music if that's of importance for this.

Answer 5 · 2020-03-27T20:17:43.000Z

Have you decided how you will be reading the audio file? And will the audio file be mono or stereo?

Once i know those things I can suggest a solution :)

Answer 6 · 2020-03-27T20:21:49.000Z

It is in stereo.
If you mean the audio file's data, the stuff I have found which I am not sure gets me the info uses fstream to get file data currently.

Answer 7 · 2020-03-27T23:12:18.000Z

Ok, so this is where it is slightly tricky because when you have stereo audio files they can be represented in different ways - the left and right channels can be in different arrays, or they can be 'interleaved' with audio samples in the same array, alternating left sample and then right sample and so on.

I wrote an audio file library (https://github.com/adamstark/AudioFile) so I'll post how I would do it with that. In that library, the audio channels are in separate arrays.

#include "AudioFile.h"

// then, somewhere later in your code wherever is relevant...

const int audioFrameSize = 512;
const int sampleRate = 44100;

// create one Gist object for each channel
Gist<double> gistLeft (audioFrameSize, sampleRate);
Gist<double> gistRight (audioFrameSize, sampleRate);

// AudioFile object for reading audio files
AudioFile<double> audioFile;
audioFile.load ("/path/to/your/audiofile.wav");

// create buffers for our audio frames
std::vector<double> audioFrameLeftChannel (audioFrameSize);
std::vector<double> audioFrameRightChannel (audioFrameSize);

// loop over all audio samples, in hops of the audio frame size
for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)
{
    // fill the audio frames
    for (int k = 0; k < audioFrameSize; k++)
    {
        audioFrameLeftChannel[k] = audioFile.samples[0][i + k];
        audioFrameRightChannel[k] = audioFile.samples[1][i + k];
    }

    // process left channel
    gistLeft.processAudioFrame (audioFrameLeftChannel);
    float zcrLeft = gistLeft.zeroCrossingRate();
    
    // process right channel
    gistRight.processAudioFrame (audioFrameRightChannel);
    float zcrRight = gistRight.zeroCrossingRate();
}

I hope this helps, let me know how you get on :)

Answer 8 · 2020-03-27T23:14:32.000Z

Thanks! I'll let you know how it goes!

Answer 9 · 2020-03-28T06:16:56.000Z

I've been at it, adding in the parts one at a time and when I added in the contents for the double for loop, I run into a debug error regarding the vector, when i is 8923136 and k is 256, it says that the vector subscript is out of range.

Answer 10 · 2020-03-28T12:59:31.000Z

Are you using my audio file library? Or a different one?

I think maybe try changing...

for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)

to

for (int i = 0; i < (audioFile.getNumSamplesPerChannel() - audioFrameSize); i += audioFrameSize)

as we're probably running just over the end of the audio buffer

Answer 11 · 2020-03-28T15:36:09.000Z

This code is very helpful, but I have a question. The Gist object is created for store the data of each frame or all frame? I saw the mag spec of Gist object is a 1d vector, so I use std:: vector<Gist<double>> to store the mag spec of the whole audio. Is this the right way to use Gist?

Answer 12 · 2020-03-28T15:42:11.000Z

That's not quite right, no. So each Gist object is there to process a series of audio frames. You can get a 1D magnitude spectrum out of the Gist object, but then it is up to you if you want to store each magnitude spectrum somewhere. So you might create a vector of vectors to do that.

To summarise - you should have one Gist object per audio channel you want to process :)

Answer 13 · 2020-03-28T18:18:11.000Z

The change worked, thanks!

Answer 14 · 2020-03-29T19:21:25.000Z

Great - glad to hear that :)

Answer 15 · 2021-06-26T07:57:38.000Z


for (int i = 0; i < audioFile.getNumSamplesPerChannel(); i += audioFrameSize)
{
    for (int k = 0; k < audioFrameSize; k++)
    {
        audioFrameLeftChannel[k] = audioFile.samples[0][i + k];
    }
    // process left channel
    gistLeft.processAudioFrame (audioFrameLeftChannel);
    float zcrLeft = gistLeft.zeroCrossingRate();
}

@adamstark As I see Gist does not handle frame overlaps right? So in fourier transform perspective, lets say while getting magnitude spectrum, it is our responsibility to take care of overlaps during chunk preparation, correct?