hitachi-speech/EEND

Adaptation error on AMI data: Invalid shape for monophonic audio: ndim=2, shape=(400000, 2)

serendipity24 opened this issue · 1 comments

Hi,

While using the AMI data (*.Mix-Headset.wav) for adaptation, I get the following error:
Traceback (most recent call last):
File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/iterators/multiprocess_iterator.py", line 435, in fetch_batch batch_ret[0] = [self.dataset[idx] for idx in indices] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/iterators/multiprocess_iterator.py", line 435, in <listcomp> batch_ret[0] = [self.dataset[idx] for idx in indices] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/chainer/dataset/dataset_mixin.py", line 67, in __getitem__ return self.get_example(index) File "/workspace/EEND/eend/chainer_backend/diarization_dataset.py", line 86, in get_example self.n_speakers) File "/workspace/EEND/eend/feature.py", line 249, in get_labeledSTFT Y = stft(data, frame_size, frame_shift) File "/workspace/EEND/eend/feature.py", line 156, in stft hop_length=frame_shift).T[:-1] File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/librosa/core/spectrum.py", line 217, in stft util.valid_audio(y) File "/workspace/EEND/tools/miniconda3/envs/eend/lib/python3.7/site-packages/librosa/util/utils.py", line 295, in valid_audio "ndim={:d}, shape={}".format(y.ndim, y.shape) lbrosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(400000, 2)

What I understand is that this error is a result of supplying stereo files in place of mono files. However, soxi, audacity, and python wave packages display the channel info as mono. I verified the shape using the librosa package independently for a few files and ndim= is never 2. Ex:
`

filename="IS1003a.Mix-Headset.wav"
audioData, sampleRate = librosa.load(filename)
print(audioData.shape)
(20142528,)`

Setting mono=False in valid_audio of utils.py does not help.

Is the mixing of multiple headset files to a single file in AMI creating the issue? What could be the other possible reasons? Is there a way out? Kindly excuse me if this is not a EEND specific issue. Any input regarding this would be helpful.

Thank You,

The AMI dataset did contain individual headset files that were stereo in nature. It was indeed not EEND specific. Closing the issue now.