Aligning offsets with bars
xstasi opened this issue · 2 comments
My question here only applies to songs where both the tempo and the time signature are known, but that should be most of the songs out there.
Imagining a song in 4/4 with 120 qpm that changes chord every 4 bars, you would have a change every 8 seconds (quarter is 60s/120=0.5s, bar is 0.5s * 4 = 2s, chord length is 2s * 4 = 8s). So the ideal output would be for example:
0.0 8.0 E:min
8.0 16.0 A:maj
16.0 24.0 E:min
[.....]
In reality time offsets in the predictions are a bit wonky, that is probably because in real sound there is not really an exact time when a chord starts. I have also tested this on a .wav render of a midi.
If the tempo is low and bars are long then durations can be sort of quantised "with a wrench hit" by approximating to the closest bar, but when the tempo is high enough (100+?) the timing error becomes too big, making it impossible to pin exactly when in the score the chord is changed.
I don't know much about how your NN works, but perhaps this is because the wave is analysed "continuously"? could it be made to analyse segments that are aligned with bars instead? In the previous song, for example, could the prediction function be made to guess what chord is there from 0.0 to 2.0, then from 2.0 to 4.0, etc?
Thanks!
perhaps this is because the wave is analysed "continuously
yes, that's pretty much the idea. the NN is not "bar-aware", it simply chunks audio into ~50ms segments, which are further grouped together into chunks of ~6 second non-overlapping segments. Each 6 second segment is an input to the NN, which provides an output of a chord label for each 50ms segment of this 6-second group. Any contiguous chord label are merged together as label for a longer segment e.g. 20 contiguous label of Am
is combined into a single Am
label for a longer segment of 1000ms. You can say that we let these chord labels tell us where the bars are, instead of the other way around.
could it be made to analyse segments that are aligned with bars instead
this sounds like a nice improvement! one way I can think of is to use a Bar estimation algorithm (I don't have much experience with this, so I couldn't recommend one, but maybe popular libraries e.g. Essentia will have something of the sort), run the NN on the bar, then pick-out the most prominent chord from the NN output as the single chord label for the bar.