How can I solve "UnicodeDecodeError" ?

Question

How can I solve "UnicodeDecodeError" ?

Closed this issue 6 months ago · 3 comments

Hello.

I am trying to train ParallelWaveGAN on my own dataset, referring to "Run training using ESPnet2-TTS recipe within 5 minutes" on this page.
However, I am getting the following error.

$ ./run.sh --stage 1 --conf conf/parallel_wavegan.v1.yaml
Stage 1: Feature extraction
Feature extraction start. See the progress via dump/eval/raw/preprocessing.*.log.
Feature extraction start. See the progress via dump/train_nodev/raw/preprocessing.*.log.
Feature extraction start. See the progress via dump/dev/raw/preprocessing.*.log.
Successfully make subsets.
Successfully make subsets.
Successfully make subsets.
run.pl: 4 / 4 failed, log is in dump/eval/raw/preprocessing.*.log
run.pl: 4 / 4 failed, log is in dump/dev/raw/preprocessing.*.log
run.pl: 4 / 4 failed, log is in dump/train_nodev/raw/preprocessing.*.log
./run.sh: 3 background jobs are failed.

The contents of preprocessing.*.log are as follows (folder and file names are changed):

# parallel-wavegan-preprocess --config conf/parallel_wavegan.v1.yaml --scp dump/train_nodev/raw/wav.1.scp --dumpdir dump/train_nodev/raw/dump.1 --verbose 1
# Started at Tue 12 Dec 2023 03:52:48 PM JST
#

0%| | 0/1174 [00:00<?, ?it/s]/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/utils.py:481: UserWarning: An error happens at loading "dump/raw/org/tr_no_dev/data/format.1/wavefile.flac"
warnings.warn('An error happens at loading "{}"'.format(ark_name))

0%| | 0/1174 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/mypath/anaconda3-2021.05/envs/espnet/bin/parallel-wavegan-preprocess", line 33, in
sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-preprocess')())
File "/mypath/ParallelWaveGAN/parallel_wavegan/bin/preprocess.py", line 349, in main
for utt_id, (audio, fs) in tqdm(dataset):
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/mypath/ParallelWaveGAN/parallel_wavegan/datasets/scp_dataset.py", line 242, in __getitem__
fs, audio = self.audio_loader[utt_id]
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/utils.py", line 479, in __getitem__
return self._loader(ark_name)
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 240, in load_mat
return _load_mat(fd, offset, slices, endian=endian)
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 330, in _load_mat
array = read_kaldi(fd, endian)
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 442, in read_kaldi
array = read_ascii_mat(fd)
File "/mypath/anaconda3-2021.05/envs/espnet/lib/python3.8/site-packages/kaldiio/matio.py", line 589, in read_ascii_mat
char = fd.read(1).decode(encoding=default_encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte
# Accounting: time=25 threads=1
# Ended (code 1) at Tue 12 Dec 2023 03:53:13 PM JST, elapsed time 25 seconds

It seems to be the same error as #372.
How can I solve this problem?

Answer 1 · 2023-12-12T08:53:41.000Z

The simplest solution is to convert from flac to wav, and the use converted audios as the data for recipe.
Or you can use pipe command in wav.scp:
e.g., hogehoge ffmpeg -i hogehoge.flac -f wav - |

Answer 2 · 2023-12-12T10:58:08.000Z

Thank you for your answer.
I will try it and report back in a few days whether it worked.

Answer 3 · 2023-12-18T08:47:14.000Z

Sorry for the delay. I tried it and it worked. Thank you.