andrewowens/multisensory

Question about the original audio waveform input

luhuijun666 opened this issue · 0 comments

Hi owen,
Thanks for your contributions!
In your paper,you said you applied a series of strided 1D convolutions to the input waveform.
So the input waveform you refered here (before fusion) is the original audio signal waveform without STFT,right?
Why and how you process the 1D signal ? Could you kindly explain this point for me?