Performance Improvements

Question

Performance Improvements

Opened this issue 12 years ago · 1 comments

Hi,

Tanks for creating this, it hast been a great help understanding the PhaseVocoder and Apples APIs.

I used "Intruments" to see what makes smbPitchShift slow. Your change to the Apple Accelerate framework was a very big step.
More improvements are: replace double with float.
The audio is 16 bit precise float has 24 bit precision, and is sufficient.
This goes for the C math functions as well. atan -> atanf; cos -> cosf etc..

After moving to float, the biggest thing will be cosf. This is caused by recomputing the window function twice each call. And can be eliminated to a vector multiplication.

Apple supplies 3 window functions.
vDSP_blkman_window
vDSP_hamm_window
vDSP_hann_window
The last one is equivalent to the current Implementation. The window function should only be computed in the init, or passed like the FFTSetup.
The first for loop would be reduced to:
vDSP_vmul(gInFIFO, 1, window, 1, gFFTworksp, 1, fftFrameSize);

After this the Phase Vocoder should be much faster. ( 4x or even more )

Answer 1 · 2012-11-09T16:46:16.000Z

Dear Arne,

Wow. This is huge. Thank you so much for doing the optimization research and work. I have been busy teaching this year and not able to work on the audiograph project. But I very much appreciate your observations and would like to include them in the next version of software when I can get back to it.

Thank you and best wishes.

Tom

From: Arne Jünemann
Sent: Tuesday, October 23, 2012 8:38 PM
To: tkzic/audiograph
Subject: [audiograph] Performance Improvements (#2)

Hi,

Tanks for creating this, it hast been a great help understanding the PhaseVocoder and Apples APIs.

I used "Intruments" to see what makes smbPitchShift slow. Your change to the Apple Accelerate framework was a very big step.
More improvements are: replace double with float.
The audio is 16 bit precise float has 24 bit precision, and is sufficient.
This goes for the C math functions as well. atan -> atanf; cos -> cosf etc..

After moving to float, the biggest thing will be cosf. This is caused by recomputing the window function twice each call. And can be eliminated to a vector multiplication.

Apple supplies 3 window functions.
vDSP_blkman_window
vDSP_hamm_window
vDSP_hann_window
The last one is equivalent to the current Implementation. The window function should only be computed in the init, or passed like the FFTSetup.
The first for loop would be reduced to:
vDSP_vmul(gInFIFO, 1, window, 1, gFFTworksp, 1, fftFrameSize);

After this the Phase Vocoder should be much faster. ( 4x or even more )

—
Reply to this email directly or view it on GitHub.