How to apply deformations from annotated jams file?
Arkanayan opened this issue · 7 comments
I am trying to use these (https://github.com/justinsalamon/UrbanSound8K-JAMS) jams files to deform sound files.
I am trying to use BaseTransformer after creating the jams object from the jams file and the corresponding sound file. Like this,
j_orig = muda.load_jam_audio('orig.jams', 'orig.ogg')
deformer = muda.base.BaseTransformer()
for jam_out in deformer.transform(jam_in):
process(jam_out)
But when I do this I am getting NotImplementedError
.
If I had to create deformations by creating objects from muda.deformers.*
classes. Then what is the point of loading annotated jams files? Please help me understand the process.
The problem here is that you're using BaseTransformer
, which is only an abstract base class and does not implement any specific transformation. Try replacing it by a PitchShiftDeformer
, or any other deformer listed in the docs.
Otherwise, your example looks fine.
If I had to manually instantiate the deformer classes, then what is the point of reading the jams files which already has a history of deformations written in it?
Every jams file has hard coded file name written in it.
Is there any way to deserialize transformers from it? any way to automate the process?
Sorry, I think I didn't understand what exactly you want to do. Is the problem that you only have deformed jams files, and want to recreate the corresponding audio?
what is the point of reading the jams files which already has a history of deformations written in it?
Generally speaking, the deformation history is stored for logging purposes, not direct reconstruction.
Is there any way to deserialize transformers from it? any way to automate the process?
There is an issue out for fully automated reconstruction #31 , but nobody's had the time to sit down and implement it. It's not a difficult thing to implement, but it does require some pretty detailed knowledge of how muda objects work.
Here's a quick sketch of how that might work (not tested, use at your own risk)
EDIT: updated and tested locally, seems to do the trick:
# Assume your unmodified audio/jams has been pre-loaded as `jam_in`
# And the deformed jams you want to reconstruct is loaded as `jam_out`
jam_re = jam_in
for step in jam_out.sandbox.muda.history:
defclass = step['transformer']['__class__']
params = step['transformer']['params']
deformer = getattr(muda.deformers, defclass)(**params)
state = step['state']
jam_re = deformer._transform(jam_re, state)
# jam_re should now match jam_out, but including audio
In the initial design, reproducibility was intended to be achieved by serializing the deformer/pipeline objects separately, and then re-running the full deformation computation. I hadn't considered the use-case where someone would distribute jams output without the audio, but it makes sense.
You're likely to encounter snags with the above code when reconstructing background noise transformers, eg, here. The problem is that those (necessarily) include paths to augmenting noise data files. If those files don't exist in your environment at the same location, the reconstruction will not work. I don't have a great solution for that at present; you might just have to detect when this happens and transform filenames on the fly.
I hope that helps.
Thank you @bmcfee for such a detailed answer. I have been going over the code base, documentation for more than a day trying to reconstruct the deformations.
Generally speaking, the deformation history is stored for logging purposes, not direct reconstruction.
I think you should mention it in your documentation.
One other thing, after the deformation, if I save the audio data as wav, it seems to get corrupted. I can't seem to play it back.
pitch_shifts = [-2, -1, 1, 2]
deformers = [muda.deformers.PitchShift(n_semitones=i) for i in pitch_shifts]
union = muda.Union(steps=[\
('pitch_shift_{}'.format(pitch_shifts[i]), pitch_shift) \
for i, pitch_shift in enumerate(deformers)])
for fn in glob.glob(os.path.join(sounds_parent_dir, fold, file_ext)):
j_orig = muda.load_jam_audio(jams_parent_dir, fn)
for i, j_new in enumerate(union.transform(j_orig)):
num_deformers = len(deformers)
audio_dict = j_new.sandbox.muda._audio
# 'audio' holds the deformed audio data
audio, sr = audio_dict['y'], audio_dict['sr']
filename, ext = os.path.basename(fn).split('.')
output_filename = '{}_{}{}.{}'.format(filename, 'pitch', i % num_deformers, ext)
output_path = os.path.join(output_dir, fold, output_filename)
# librosa.output.write_wav(output_path, audio, sr)
scipy.io.wavfile.write(output_path, sr, audio)
print(output_path)
If I try to play the files in media player, I am getting This item was encoded in a format that's not supported.
error. While the original file plays fine.
In case you want the original file and the transformed files, I am attaching the link.
https://drive.google.com/open?id=0BwewQcemtBXfZjh6UUtxekM0YXM
Nevermind, the audio files can be played back with sounddevice.
Yeah, saving the audio can be a pain. I recommend pysoundfile for this.
Thank you @bmcfee for all your help. Your suggestions helped me immensely.
Edit: Your code snippet is working perfectly. If you want I can implement the feature based on your snippet.
Reopening this, just to keep tabs on it until we put reconstruction in the library proper.