This project uses OpenGL to create a 3D visualization of a sound source and a listener. The audio is processed using 3D Audio techniques to match the distance and angles of the sound source compared to the listener. The project utilizes the HRTF interpolation and distance delay algorithms found in Jose Belloch's paper.
- Read input & reverb file
- (optional) do convolution reverb on input
- Read HRIRs
- Transform (all 710 * 2) HRIRs to HRTFs
- Create and allocate all audio buffers
- Create FFT Plans
- Import 3D model
- Create mesh for floor
- Graphics side of the program updates the X, Y, and Z coordinates of the sound source
- Write to the sound source class
- Computes azimuth and elevation with each frame refresh
- Audio side buffer size ~128 or 256 at 44.1k sample rate
- 128 samples = 2.8 milliseconds
- GPU computation takes ~0.3 milliseconds in the worst case scenario
A purple cartoon character indicates the listener, which remains is movable around the space. The green indicates the sound source which is stationary in the middle. Different keys listed below will move the sound source in the X, Y, and Z axes. The visualization can also be rotated by left clicking and dragging the visualization which helps to better visualize the 3D space. The user can also zoom in and out by clicking and dragging the right arrow key or by using the scroll wheel. The R key will reset back to the default perspective and position. My program also optionally writes the output to a sound file.
The cartoon character, which I’ve fondly named Jefferson, was created by Vinnie Huynh in Blender. I exported the model to an FBX file and imported it that way.
OpenGL - short for Open Graphics Library. It is an API/library in several programming languages to draw 2D and 3D images. It’s portable and it’s implemented primarily in each computer’s hardware.
CUDA - acronym for Compute Unified Device Architecture. It’s “a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs)” (Nvidia). It is a proprietary but free API in several different programming languages to speak directly to NVIDIA hardware and utilize parallel processing.
cuFFT – NVIDIA CUDA Fast Fourier Transform library.
Thrust – “Thrust is a C++ template library for CUDA based on the Standard Template Library (STL)” (Nvidia). It’s a library within CUDA that utilizes parallel processing for algorithms that already exist in C++’s standard library such as summing, reducing, and sorting.
HRTF – acronym for Head Related Transfer Function. Several short audio filters depending on angle (azimuth) and elevation that can make a sound come from different locations in 3D space. The ones I used were the compact set from MIT and KEMAR
PortAudio - Portable audio library used to connect to the computer’s sound device
ALSA - acronym for Advanced Linux Sound Architecture. One of the libraries that PortAudio uses under the hood on Linux.
ASIO - acronym for Audio Stream Input/Output. Soundcard driver protocol by Steinberg for low-latency audio. Proprietary software that is freely available by Steinberg, but it can not be re-distributed. PortAudio on Windows can be configured to use ASIO.
libsndfile – Portable audio library used to read contents of wave files
Blender – Open source, free 3D creation suite
ASSIMP – Acronym for Open Asset Import Library. It “is a portable Open Source library to import various well known 3D model formats in a uniform manner.” This was used to import an FBX file into OpenGL.
Belloch, J. A., Ferrer, M., Gonzalez, A., Martinez-Zaldivar, F. J., & Vidal, A. M. (2013). Headphone-based virtual spatialization of sound with a GPU accelerator. Journal of the Audio Engineering Society, 61 (7/8), 546-561.