mpc001/auto_avsr

What result should be obtained under normal circumstances after preprocessing?

Xuzhejia opened this issue · 6 comments

What result should be obtained under normal circumstances after preprocessing?
Currently I only have a 0KB file called lrs3_train_transcript_lengths_seg24s.csv. Is this correct?

After preprocessing, you should be able to see training files and its tokens in "lrs3_train_transcript_lengths_seg24s.csv". It should not 0KB.

After preprocessing, you should be able to see training files and its tokens in "lrs3_train_transcript_lengths_seg24s.csv". It should not 0KB.

Not sure if I entered the wrong command
Can you give me your command for reference?
Thank you.

My problem was in the data_module.py file, which was trying to read an mp4 file and got an error,
waveform, sample_rate = torchaudio.load(data_filename, normalize=True)
causing subsequent steps to be skipped. Is this normal and how can it be resolved?

My problem was in the data_module.py file, which was trying to read an mp4 file and got an error, waveform, sample_rate = torchaudio.load(data_filename, normalize=True) causing subsequent steps to be skipped. Is this normal and how can it be resolved?
I also found this problem and I'm still looking for a solution.

My problem was in the data_module.py file, which was trying to read an mp4 file and got an error, waveform, sample_rate = torchaudio.load(data_filename, normalize=True) causing subsequent steps to be skipped. Is this normal and how can it be resolved?
I also found this problem and I'm still looking for a solution.

    def load_audio(self, data_filename):
        video = VideoFileClip(data_filename)
        video.audio.write_audiofile(data_filename[:-1]+"3",logger=None)
        waveform, sample_rate = torchaudio.load(data_filename[:-1]+"3", normalize=True)
        return waveform, sample_rate

This is my solution.

My problem was in the data_module.py file, which was trying to read an mp4 file and got an error, waveform, sample_rate = torchaudio.load(data_filename, normalize=True) causing subsequent steps to be skipped. Is this normal and how can it be resolved?
I also found this problem and I'm still looking for a solution.

    def load_audio(self, data_filename):
        video = VideoFileClip(data_filename)
        video.audio.write_audiofile(data_filename[:-1]+"3",logger=None)
        waveform, sample_rate = torchaudio.load(data_filename[:-1]+"3", normalize=True)
        return waveform, sample_rate

This is my solution.

def load_audio(self, data_filename):
    probe = ffmpeg.probe(data_filename)
    process = (
    ffmpeg.input(data_filename)
    .output('-', format='wav', acodec='pcm_s16le', ar='16000')
    .run_async(pipe_stdout=True, quiet=True)
    )
    stdout, _ = process.communicate()

    waveform, sample_rate = torchaudio.load(io.BytesIO(stdout), normalize=True)
    return waveform, sample_rate

This also works