Logical bug in MaestroDataset
almostimplemented opened this issue · 1 comments
almostimplemented commented
In utils/data_generator.py, line 86, we ensure we grab a segment that is contained by the waveform:
# Load hdf5
with h5py.File(hdf5_path, 'r') as hf:
start_sample = int(start_time * self.sample_rate)
end_sample = start_sample + self.segment_samples
if end_sample >= hf['waveform'].shape[0]:
start_sample -= self.segment_samples
end_sample -= self.segment_samples
However, you fail to update start_time
, so when you later grab the target_dict
, it will be off by self.segment_seconds
.
# Process MIDI events to target
(target_dict, note_events, pedal_events) = \
self.target_processor.process(start_time, midi_events_time,
midi_events, extend_pedal=True, note_shift=note_shift)
I don't think this is an issue, because your Sampler
logic only constructs meta for valid segments:
while (start_time + self.segment_seconds < hf.attrs['duration'])
but it is still a logical error so I thought I would report and offer a fix.