audibility-constant-phase

Audibility of Constant Phase Shifts in Audio Signals

The compiled paper can be found on our blog

Theory

Filters with unit magnitude (except for DC) and constant phase shift are known as fractional Hilbert transforms.

Constant phase shifts of +45deg or +90deg occur in sound field synthesis (SFS) equalisation tasks, there with an additional highpass slope, making them so called half or full differentiators.

Since no perceptual studies are known to the authors whether the existence or absence of such a constant phase shift is audible, this project initiates this research questions. Although, initially motivated from SFS background, the question actually belongs to fundamental human hearing and audiology research.

We give analytic expressions for the fractional Hilbert transform filter and their suitable practical implementation. Furthermore we discuss an initial listening experiment exploring the perceptibility of constant phase shifts.

Open Science

This project is following the open science paradigm. All information is thus archived in the Github hosted repository

https://github.com/spatialaudio/audibility-constant-phase

under the CC BY 4.0 and MIT license. refers to the repository state when submitting the paper.

We provide single modifications of the webMUSHRA software , thus explicitly indicating the usage of a "Third-Party Modified Version of the webMUSHRA Software" in our listening experiment and retaining LICENSE.txt, THIRD-PARTY-NOTICES.txt of the webMUSHRA software.

Conference Paper

This git repository is accompanied by the paper

Frank Schultz, Nara Hahn, Sascha Spors, University of Rostock (2019): Detection of Constant Phase Shifts in Filters for Sound Field Synthesis, 5th Intl Conf on Spatial Audio (ICSA), Ilmenau, September 2019, ORAL-5-3, abstract reviewed paper

The sources are included in the paper folder using the free IEEEtran latex class, rather than the ICSA template.

Conference Talk

The corresponding talk to the above mentioned paper was given at 5th Intl Conf on Spatial Audio (ICSA), Ilmenau, September 2019, ORAL-5-3.

The corresponding sources are included in the talk folder.

Notebooks

There are several Jupyter notebooks containing fundamental signal processing calculus and validation routines.

The main derivations and aspects are covered in aperiodic-signal-constant-phase-shifter.ipynb and periodic-signal-constant-phase-shifter.ipynb.

The mkfig-xxx.py Python scripts render graphics that are directly used in the paper and the slides for the talk.

Furthermore, Python standalone code handle generation of stimuli.

Anaconda Environment

Python software and Jupyter notebooks were programmed and validated under Linux OS and Mac OSX 10.13.6 and 10.14.4. The anaconda environment anaconda_env.yml was used to prepare all stimuli and analyse the test results.

ABX Test Framework

We utilise the webaudio-based webMUSHRA software as of version: master c929877 to realise webbrowser based ABX testing.

See Schoeffler, M. et al., (2018). webMUSHRA - A Comprehensive Framework for Web-based Listening Tests. Journal of Open Research Software. 6(1), p.8

We used virtual webserver (localhost).

Install webMUSHRA

Working with webMUSHRA on a local machine is pretty straightforward if the computer can handle local webserver services (i.e. setting up a virtual webserver with PHP).

To install all software, please do the following initial steps:

1.) Get the repository data with SSH at git@github.com:spatialaudio/audibility-constant-phase.git

The repository folder/file structure contains a subfolder abx_software/webMUSHRA_c929877_20180814.

2.) Get the zipped version of webMUSHRA or directly clone its master commit c929877 into abx_software/webMUSHRA_c929877_20180814

It is very likely that the webMUSHRA releases 1.3 and 1.4 will work as well, however we did not test this!

3.) Some versioned files of the audibility-constant-phase repository appear to be modified or deleted, when copying the webMUSHRA stuff into the folder. So, check with git status, git a. Please, revert these changes in order to work with our intended test design, i.e. a "Third-Party Modified Version of the webMUSHRA Software". Especially, make sure that index.html calls MushraAudioControlSinglePlay.js.

The next subsection explains the reasons.

webMUSHRA Modifications

As mentioned above "Third-Party Modified Version of the webMUSHRA Software" was used in the listening experiment. This was due to our following requirements.

Audio Rendering

webMUSHRA by default renders looped playback with fade-out / fade-in when switching and looping stimuli, presumably intending seamless switching.

Crossfading or fast fade-out/fade-in between ABX stimuli in our test scenario produces highly detectable artefacts due to different phase alignment of the reference vs. the phase shifted audio versions. These switching artefacts are not what we are being after for, but rather detection / audibility of constant phase shifts.

That's why we've implemented some modifications. To meet our special requirements of audio rendering as

non-looped playback
always enforcing stop before triggering a new stimulus from start (i.e. no crossfade, no fade-out / fade-in with very short time interval between stimulus change)

the ABX related audio rendering engine written in JavaScript abx_software/webMUSHRA_c929877_20180814/lib/webmushra/audio/MushraAudioControl.js was slightly modified to MushraAudioControlSinglePlay.js, which needs loaded in the modified abx_software/webMUSHRA_c929877_20180814/index.html.

GUI Appearance

The initial GUI includes logos of the involved institutions that create webMUSHRA. We highly appreciate their efforts, it is a very useful tool. However, we felt that these logos are diverting too much attention to the actual ABX detection task. So we left these logos out in index.html. Maybe, it is worth to consider a promo-webpage at the beginning or at the end of the test that explicitly shows these logos and some other useful information one-time, leaving the actual test page cleaned up.

GUI Language and Labeling

In abx_software/webMUSHRA_c929877_20180814/lib/webmushra/nls/nls.js we changed to

nls['de']['stopButton'] = "Stop";
nls['de']['pauseButton'] = "Stop";
nls['de']['reference'] = "X";
nls['de']['quest'] = "Welcher Stimulus ist X?";

exchanging "Reference" to the more appropriate term "X", since for the test subject it is not relevant and even might a biasing information what is defined as reference. The labeling A, B and X comes with no bias and appears more consistent. Furthermore, since we always start stimulus from beginning, there is no pause, but only stop.

In abx_software/webMUSHRA_c929877_20180814/startup.js we changed to

if (config.language == undefined) {
    config.language = 'de';
}

using german language in the GUI.

ABX Test Configuration

Listening Test

In abx_software/webMUSHRA_c929877_20180814/configs running the Jupyter notebook Phase_ABX_create_yaml.ipynb creates a yaml file that configures the webMUSHRA for the dedicated ABX test. Thus, something like ABX_SchultzHahn19_0.yaml is written to the same folder. Furthermore the same file name with suffix .txt includes information of the ABX trials. Note that the actual playing sequence is randomised by webMUSHRA API.

Make sure, that the referenced stimuli of the yaml are stored in abx_software/webMUSHRA_c929877_20180814/configs/resources/stimuli. Again, if the test appears to be loading forever, it is very likely that audio files are not found in this specified folder.

In our test parts 0-3 were used, for which ABX_SchultzHahn19_0.yaml, ABX_SchultzHahn19_1.yaml, ABX_SchultzHahn19_2.yaml, ABX_SchultzHahn19_3.yaml were created, using 7,6,6 and 6 repetitions of ABX trials of the same audio content. Thus, for 5 audio contents, for 0th part 5 x 7=35 trials had to be performed, the other parts consisted of 5 x 6=30 trials.

Training

In abx_software/webMUSHRA_c929877_20180814/configs running the Jupyter notebook Training_ABX_create_yaml.ipynb creates ABX_SchultzHahn19_Training.yaml, which realises a training session prior to the actual listening test.

For that the pink noise stimuli pnoise_ref.wav (0 dB level) and pnoise_treat (-1dB level) (created by notebooks/generate_pink_noise.ipynb) are used in a four trial ABX session. The detection task of different loudness is rather simple and should make test subjects with the ABX GUI. It is important that test subjects do not get an idea of the actual research question within the training phase.

Audio Content and Stimulus Generation

For the ABX test we used the musical audio content

castanets-dry (Matthias Frank's (IEM Graz) version of EBU SQAM CD track 27 re-programmed as anechoic version)
Hotel California, Hell Freezes Over, Eagles, Geffen, 1994, stereo mix downmixed to mono (if the original record is not at hand, hotelcalifornia_mono_fake.wav might give an impression of the rhythmic structure of the song)

and the the artificial audio content (generated with provided source code)

lowpass filtered pink noise generated by generate-nonperiodic-stimuli.py
square wave bursts generated by generate-periodic-stimuli.py

Note that reference audio material is mono!

The python scripts generate-periodic-stimuli.py and generate-nonperiodic-stimuli.py take care of the stimulus generation used for the dedicated ABX test.

generate-nonperiodic-stimuli.py can load full songs as mono wav files and renders phase shifted stimuli from the specified sample interval [n_start, n_stop] as 24 Bit non-dithered PCM wav. Note that the filter_order is a crucial parameter. If the FIR filter length is too small, very low frequency and very high frequency ripples in the magnitude response will occur, which obviously reveals the phase shifted manipulation by different bass level perception.

Note, that we treat pink noise (which is generated within generate-nonperiodic-stimuli.py) as a full song audio content rather than a periodic sequence.

Considering the two audio contents (Hotel California and pink noise) as rectangular windowed signals of infinite duration, the filter order of 3.963.530 samples (≈ 90s!) ensures that linear convolution of the chosen excerpt of Hotel California is complete. The same FIR filter was utilised for consistence for the pink noise stimulus.

Note that due to copyright restrictions, we use hotelcalifornia_mono_fake.wav to create the stimuli used in the repository. You might want to use the original record if at hand.

generate-periodic-stimuli.py is used for audio content for which a periodicity is assumed, writing 24 Bit non-dithered PCM wav. This holds e.g. for square wave bursts, kick drums of certain bpm tempo (not used in listening experiment) and castanet rhythm.

For this audio material, the ideal, infinite phase shift filter can be applied. See the signal processing section in the paper or the notebook for details.

Here, the crucial parameter is t_period that must ensure, the audio content to be much shorter than t_period in order to avoid overlapped and aliased convolution contributions.

Furthermore, in informal pretests we considered EDM music with very low crest factor and other test stimuli, such as

Tiesto & DallasK, Show Me (Original Mix), 2015
AutoErotique, Asphyxiation, 2013
Knife Party, 404, 2014,
electronic kick drum library (tracks: sub_kick_23_G.wav, min_kick_22_G.wav, trad_kick_12_D.wav), see https://soundcloud.com/8-bit-logic and http://99sounds.org/kick-drum/
Meyer Sound's music noise signal MNoise_MSPN_90_916_049_15.wav, see https://m-noise.org/

but did not include it in the final listening test.

Audio Hardware

We used an RME Fireface UC (firmware v126, driver v3.16, -10 dBV output sensitivity) connected to an Apple Mac mini 2018 (OS 10.14.4). Phones out (7/8) of the RME with -10 dB Fader Gain was connected to a Sennheiser HD800 (#50526). The pink noise stimulus pnoise_ref.wav (created by notebooks/generate_pink_noise.ipynb) played back on both channels leads to 68.8 dB(A)Leq / 83.8 dB(C)Peak on each channel. This was measured with the G.R.A.S 43AF headphone coupler kit using a RA0039 (#343741) ear simulator with the dedicated 40AG 1/2" microphone capsule (#333274). The microphone was connected to a Bruel & Kjaer G-4 2270 sound pressure level (SPL) meter (#3008532) via a customised G.R.A.S. 26AC-S6 preamplifier (#330170) and was calibrated by B&K 4231 pistonphone (#3014273). The measurement routine is according to the IEC 60318 standard.

pnoise_ref.wav signal statistics per channel (this file is diotic):

sample peak = -12.04 dB
true peak = -11.65 dB (by oversampling)
RMS = -25.15 dB
loudness = -26.89 dB LUFS (R128, ITU BS.1770/4)
crest factor = 13.5 dB

All stimuli (except castanets, see below) for the listening test were calibrated to a loudness of -23 dB LUFS. Thus, if pnoise_ref.wav would be calibrated to -23 dB LUFS (by a gain of 3.89 dB) a SPL of 72.69 dBALeq / 87.69 dBCPeak is individually obtained for left and right headphone channels measured with the above mentioned headphone-to-ear coupler.

The LUFS calculation for the castanet stimulus does not match the perceived loudness very well in comparison to the other stimuli. Thus, we calibrated castanets to -35 dB LUFS, instead of -23 dB LUFS by individual perception.

Webserver Start / Stop

Since webMUSHRA relies on the webaudio technology, a HTML5 based interface is provided. This should be run on a server. We used a virtual webserver (localhost) on our lab computer (Apple Mac mini 2018, Mojave 10.14.4).

To do so:

i) open a terminal and make sure that cd is abx_software/webMUSHRA_c929877_20180814.

ii) Then prepare the local webserver by php -S localhost:8000 (or any other suitable port). Under windows and Linux OS, running a webserver might require further steps.

iii) Then, within a browser the URL http://localhost:8000/?config=ABX_SchultzHahn19_0.yaml instantiates the ABX test based on the specified yaml file.

If the test appears to be loading forever, it is either very likely that audio files are not found in the specified folder. Or, another issue could be the yaml file itself. So make sure, that you follow proper yaml identation and the format required by webMUSHRA (see complete.yaml, experimenter.md), when you change the yaml file manually.

The ABX test asks for age, gender before ending. Submitting all the data, triggers writing of the results into abx_software/webMUSHRA_c929877_20180814/results/abx_constant_phase_shift/paired_comparison.csv. Then somehow the virtual webserver stops/crashes with a write segmentation fault error (at least on our machines).

So, in the dedicated terminal, Ctrl+C and php -S localhost:8000 runs a new instance.

However, this issue is handy, since a re-start of the test can only be triggered by means of the above provisions and not by accident.

Analysis of ABX Test Results

ABX test results are stored in abx_software/webMUSHRA_c929877_20180814/results/abx_constant_phase_shift/paired_comparison.csv Make sure that you backup this file frequently, when running the test. We provide it within the git repository. Once the test is finished all test results will be stored within this file.

In abx_software/webMUSHRA_c929877_20180814/configs running the Jupyter notebook Phase_ABX_analysis.ipynb reads this csv file and performs sorting and an analysis of the data, which might require further customisation.

At the beginning of the notebook some considerations regarding the test statistics are documentated based on GPower3, cf. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191