Wave.read ? what if two channels?
Closed this issue · 17 comments
Earlier @lxkain @tuanad121 and myself came to the agreement that multi-channel audio files should be MultiTracks with each channel of audio being its own wave track.
As I am revisiting some code, this is becoming a bit problematic, as it is now not clear what Wave.read()
should do if you give it an audio file with more than 1 channel audio. Presently the following behavior is what occurs
Audio File Channels | Channel Parameter Entered | Result |
---|---|---|
1 | None | Wave Object |
2 | None | MultiChannelError |
1 | 0 | Wave Object |
2 | 0 | Wave Object |
2 | 1 | Wave Object |
Where the wave object returns always has one track.
I'm now wondering if this approach really makes sense, and if instead we should discontinue the use of Wave.read
altogether.
We also have a tracking.load_audio
method which effectively always returns a multi-track (even if the audio is a single track).
Either of you have thoughts?
I propose that we have Wave.read()
return a list of Wave tracks, one track per channel. We forget about tracking.load_audio()
altogether, and put in the documentation that we encourage people that want to interact with multiple tracks to create a MultiTrack
, which can take a list of tracks embed within in its __init__
method.
Thoughts?
Let's talk about values.
Of all the non-collection (i.e. not Multitrack, which is a collection) / fundamental tracks (Event, Label, Partition, TimeValue, Value, and Wave), the following can have multi-dimensional features / vectors:
Label
Partition
TimeValue
Wave
At the moment, Label and Partition values are typically strings, but could be numeric, although we don't perform numeric operations with them (yet?). TimeValue can interpolate between its values.
It would make sense to support both scalars and vectors for all 4 of the above. If we do, then the role of MultiTrack becomes primarily that of combing different types of tracks, although several Wave tracks, each of which are vector-valued, would also be permitted, perhaps for organizational reasons.
We could support Wave.from_Multitrack()
and we can also support Wave.from_list(waves: List[Wave])
.
This seems like the cleanest approach to me. I am appreciating that TimeView will need to be updated to handle this.
As an aside, I am not sure why there is a tracking.load_audio()
function. There is plenty of non-audio that is stored in .wav files and the like. Why not just load()
?
One add'l thought I have is that when loading a multi-channel waveform, they could all be presented in the same panel (as opposed to one wave / panel).
From TimeView's perspective, having the two channels (and potentially the average of the two channels?) be shown in the same panel sounds fine, but
If you think Wave should support multi-dimensional values
parameter, we should get to work on that, and expand the test suite to ensure all the operations that Wave
supports work on multidimensional values ....
One thing we do really need to is break up the de-serialization process (reading from a file) and constructing the track object of whatever kind. Breaking up that process will allow us to use track objects when we have data sources that signalworks does not expect. For example, if we have some really strange audio file the application using signalworks can incorporate the really strange dependency.
Not sure if it makes sense to show the average - if that's desired we can have a processor that does this.
I don't believe it will be hard to support multi-channel waves.
What do you think about putting all the io in a separate module (e.g. io.py
) and then importing libraries as needed, plus have one or several auto-detect functions as well?
I'm good with having see on separate io module, I think that's a great idea
was thinking more about this, how should dsp.spectrogram
handle the case of a Wave object containing more than 1 channel, should dsp.spectrogram
take an optional parameter indicating which "channel" to use (with averaging all the channels if no option is specified)?
My preference would be to default to the first channel. I understand that for stereo recording averaging makes some amount of sense (although you could also have severe cancellations), but in the general case (like multiple leads from a EEG) averaging would not be meaningful. For TimeView, this doesn't seem to have any implications, since all waveforms would already shown in separate views (but in one pane, at least originally). As I mentioned previously, we could have an "Average" Processor if folks want to see the average, and then take the spectrogram of that.
On the topic of spectrograms, a pre-emphasis filter being applied during the spectrogram calculation. In my use case, having a new Wave track with the pre-emphasis filter applied doesn't make much sense (as I only want this filtering to occur for my spectrogram calculation).
Pre-emphasis should be an option for spectrograms, and it's a good one for speech. Processors should save user's settings as global defaults. So, if it's not desired, one would only have to turn it off once. The original waveform should not be affected in any case.
Follow up issue, is wave.values
can be 2D for dual channel audio, we should make that array 2D for single channel, this will ensure we can avoid having if-statements for the number of dims before doing some operations, and will make it easier to convert mono to stereo using np.tile
(if needed).
Absolutely agree, a singled channel should be a 1xN matrix, as opposed to a vector.
@lxkain it appears both scipy
and soundfile
actually read-multichannel arrays as N x #Channels, perhaps we should structure the numpy array is rows = samples, cols = channels?
I know sounddevice
(a library I've been using for playback) expects audio of that array as well.
(both the libraries import 1 channel data as vectors).
Although not my personal preference, if the arrays are row-major order (e.g. which they probably will be since C is being used underneath) then this leads to less jumping around in memory, which is a good thing.
Sounds good, I'll implement this method...but this has consequences in many many places (framing, spectrogram-ing, etc).
With signalworks 0.3.2 I feel comfortable closing this issue. Now, wave.value
is a 2D array no matter what (samples x channels). in the spectrogram method, the data is flattened into a vector... there are likely bugs elsewhere but we'll deal with those as they come up.