Adaptation error on AMI data: Invalid shape for monophonic audio: ndim=2, shape=(400000, 2)

Question

Adaptation error on AMI data: Invalid shape for monophonic audio: ndim=2, shape=(400000, 2)

serendipity24 opened this issue 3 years ago · 1 comments

Hi,

While using the AMI data (*.Mix-Headset.wav) for adaptation, I get the following error:
Traceback (most recent call last):
File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/iterators/multiprocess_iterator.py", line 435, in fetch_batch batch_ret[0] = [self.dataset[idx] for idx in indices] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/iterators/multiprocess_iterator.py", line 435, in <listcomp> batch_ret[0] = [self.dataset[idx] for idx in indices] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/dataset/dataset_mixin.py", line 67, in __getitem__ return self.get_example(index) File "/workspace/EEND/eend/chainer_backend/diarization_dataset.py", line 86, in get_example self.n_speakers) File "/workspace/EEND/eend/feature.py", line 249, in get_labeledSTFT Y = stft(data, frame_size, frame_shift) File "/workspace/EEND/eend/feature.py", line 156, in stft hop_length=frame_shift).T[:-1] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/librosa/core/spectrum.py", line 217, in stft util.valid_audio(y) File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/librosa/util/utils.py", line 295, in valid_audio "ndim={:d}, shape={}".format(y.ndim, y.shape) lbrosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(400000, 2)

What I understand is that this error is a result of supplying stereo files in place of mono files. However, soxi, audacity, and python wave packages display the channel info as mono. I verified the shape using the librosa package independently for a few files and ndim= is never 2. Ex:
`

filename="IS1003a.Mix-Headset.wav"
audioData, sampleRate = librosa.load(filename)
print(audioData.shape)
(20142528,)`

Setting mono=False in valid_audio of utils.py does not help.

Is the mixing of multiple headset files to a single file in AMI creating the issue? What could be the other possible reasons? Is there a way out? Kindly excuse me if this is not a EEND specific issue. Any input regarding this would be helpful.

Thank You,

Answer 1 · 2021-09-07T06:52:02.000Z

The AMI dataset did contain individual headset files that were stereo in nature. It was indeed not EEND specific. Closing the issue now.