TimeViewers/signalworks

Wave.read ? what if two channels?

Closed this issue · 17 comments

Earlier @lxkain @tuanad121 and myself came to the agreement that multi-channel audio files should be MultiTracks with each channel of audio being its own wave track.

As I am revisiting some code, this is becoming a bit problematic, as it is now not clear what Wave.read() should do if you give it an audio file with more than 1 channel audio. Presently the following behavior is what occurs

Audio File Channels Channel Parameter Entered Result
1 None Wave Object
2 None MultiChannelError
1 0 Wave Object
2 0 Wave Object
2 1 Wave Object

Where the wave object returns always has one track.

I'm now wondering if this approach really makes sense, and if instead we should discontinue the use of Wave.read altogether.

We also have a tracking.load_audio method which effectively always returns a multi-track (even if the audio is a single track).

Either of you have thoughts?

I propose that we have Wave.read() return a list of Wave tracks, one track per channel. We forget about tracking.load_audio() altogether, and put in the documentation that we encourage people that want to interact with multiple tracks to create a MultiTrack, which can take a list of tracks embed within in its __init__ method.

Thoughts?

Let's talk about values.

Of all the non-collection (i.e. not Multitrack, which is a collection) / fundamental tracks (Event, Label, Partition, TimeValue, Value, and Wave), the following can have multi-dimensional features / vectors:

Label
Partition
TimeValue
Wave

At the moment, Label and Partition values are typically strings, but could be numeric, although we don't perform numeric operations with them (yet?). TimeValue can interpolate between its values.

It would make sense to support both scalars and vectors for all 4 of the above. If we do, then the role of MultiTrack becomes primarily that of combing different types of tracks, although several Wave tracks, each of which are vector-valued, would also be permitted, perhaps for organizational reasons.

We could support Wave.from_Multitrack() and we can also support Wave.from_list(waves: List[Wave]).

This seems like the cleanest approach to me. I am appreciating that TimeView will need to be updated to handle this.

As an aside, I am not sure why there is a tracking.load_audio() function. There is plenty of non-audio that is stored in .wav files and the like. Why not just load()?

One add'l thought I have is that when loading a multi-channel waveform, they could all be presented in the same panel (as opposed to one wave / panel).

From TimeView's perspective, having the two channels (and potentially the average of the two channels?) be shown in the same panel sounds fine, but

If you think Wave should support multi-dimensional values parameter, we should get to work on that, and expand the test suite to ensure all the operations that Wave supports work on multidimensional values ....

One thing we do really need to is break up the de-serialization process (reading from a file) and constructing the track object of whatever kind. Breaking up that process will allow us to use track objects when we have data sources that signalworks does not expect. For example, if we have some really strange audio file the application using signalworks can incorporate the really strange dependency.

Not sure if it makes sense to show the average - if that's desired we can have a processor that does this.

I don't believe it will be hard to support multi-channel waves.

What do you think about putting all the io in a separate module (e.g. io.py) and then importing libraries as needed, plus have one or several auto-detect functions as well?

I'm good with having see on separate io module, I think that's a great idea

was thinking more about this, how should dsp.spectrogram handle the case of a Wave object containing more than 1 channel, should dsp.spectrogram take an optional parameter indicating which "channel" to use (with averaging all the channels if no option is specified)?

My preference would be to default to the first channel. I understand that for stereo recording averaging makes some amount of sense (although you could also have severe cancellations), but in the general case (like multiple leads from a EEG) averaging would not be meaningful. For TimeView, this doesn't seem to have any implications, since all waveforms would already shown in separate views (but in one pane, at least originally). As I mentioned previously, we could have an "Average" Processor if folks want to see the average, and then take the spectrogram of that.

On the topic of spectrograms, a pre-emphasis filter being applied during the spectrogram calculation. In my use case, having a new Wave track with the pre-emphasis filter applied doesn't make much sense (as I only want this filtering to occur for my spectrogram calculation).

Pre-emphasis should be an option for spectrograms, and it's a good one for speech. Processors should save user's settings as global defaults. So, if it's not desired, one would only have to turn it off once. The original waveform should not be affected in any case.

Follow up issue, is wave.values can be 2D for dual channel audio, we should make that array 2D for single channel, this will ensure we can avoid having if-statements for the number of dims before doing some operations, and will make it easier to convert mono to stereo using np.tile (if needed).

Absolutely agree, a singled channel should be a 1xN matrix, as opposed to a vector.

@lxkain it appears both scipy and soundfile actually read-multichannel arrays as N x #Channels, perhaps we should structure the numpy array is rows = samples, cols = channels?

I know sounddevice (a library I've been using for playback) expects audio of that array as well.

(both the libraries import 1 channel data as vectors).

Although not my personal preference, if the arrays are row-major order (e.g. which they probably will be since C is being used underneath) then this leads to less jumping around in memory, which is a good thing.

Sounds good, I'll implement this method...but this has consequences in many many places (framing, spectrogram-ing, etc).

With signalworks 0.3.2 I feel comfortable closing this issue. Now, wave.value is a 2D array no matter what (samples x channels). in the spectrogram method, the data is flattened into a vector... there are likely bugs elsewhere but we'll deal with those as they come up.