Speeding up Scaper with in-memory operations
pseeth opened this issue · 9 comments
I've done some investigation into speeding up Scaper quite a bit. The key idea is to make everything happen in-memory and most importantly reduce the number of exec calls. I'm writing out the roadmap that we came up with offline here. The milestones we should hit to make this happen are:
- Write a profiling script that reliably tells us the speed of Scaper right now. This shouldn't touch any existing code but should be a good benchmarking tool that we can use to report progress on this issue. (PR: #106)
- We have
soundfile
now, so let's usesoundfile.info
instead ofsox.file_info
. This avoids a costly exec call. (PR #110) - The current
pysox
master now can accept and return numpy arrays when doingtfm.build
. This API may change slightly, however. See this issue and the associated PR. By updating to this, we will be able to do everything in-memory, but will still have exec calls. (PR #111) - Once we use
soundfile
to load the audio files, we can get rid ofsox.Combiner
and should do all the summing of mixtures in-memory. (PR #111) - Since we have
soundfile
which can read from an offset and a duration, we should use that instead of making calls totfm.trim
, which is not as efficient.tfm.trim
decodes the entire file into memory, then trims it, which can be costly for large files, I believe. The same goes fortfm.fade
which is slow compared to just multiplying in-memory with a quarter-sine ramp. (PR #117) - I have been working on making Python bindings for Sox with the goal of it being a drop-in replacement for PySox's transformer API. The project is soxbindings. In it, I subclass
sox.Transformer
to have the same API with respect to Scaper's use case. The bindings are generated viapybind11
and avoid all exec calls. The change at this point should be as simple asimport sox
->import soxbindings as sox
, after makingsoxbindings
pip-installable with cross-platform wheels. At the momentsoxbindings
does not work on Windows, but we can make it an optional import in Scaper: if it's installed, use it, otherwise fall back tosox
. - We decided that
tfm.trim
replacement should happen later, as it changes the regression data. - Now we make use of pyloudnorm, which gets around another exec call that's in the current LUFS calculation, which uses
ffmpeg
. This breaks the regression data as pyloudnorm and ffmpeg LUFS don't match exactly (they're close enough). Everything up to this point should match the current regression data and pass tests. After this, we need to change the regression data.
This issue is related to #76 and #3.
Update, the returns milestone Scaper.generate
has been moved to a separate issue: #114.
- We can now have the
Scaper.generate
function return numpy arrays. This is useful for on-the-fly mixing. At this stage, we should also make it so that the Scaper object doesn't have to save jams and audio to disk but can instead just return them. Possibly this could be done viajams_file=None
andaudio_file=None
when passing it into Scaper as arguments.
Looking ahead, @justinsalamon, can you weigh in on API design for this bullet point when you have a second?
We can now have the Scaper.generate function return numpy arrays. This is useful for on-the-fly mixing. At this stage, we should also make it so that the Scaper object doesn't have to save jams and audio to disk but can instead just return them. Possibly this could be done via jams_file=None and audio_file=None when passing it into Scaper as arguments.
@pseeth good call, let's hash this out.
I agree that we should change the default behavior to:
- return audio and annotation in memory
- don't save to disk
How about:
- We make
audio_path
andjams_path
optional, and set toNone
by default. If a path is provided, it's used to save to disk. This will make these identical totxt_path
, and we should probably bunch them together, i.e. placetxt_path
immediately afterjams_path
in the function call, and re-order the docstrings accordingly. no_audio
will control whether audio is generated or not (overall), so if it's set to True we (1) returnNone
in memory for the audio array and (2) do not save audio to disk, even if a path is specified.- We'll have to update docs everywhere to reflect this API change.
This will break the API, so again we need to determine if we're bumping to 1.4.0 or 2.0.0.
Awesome, thanks. That all sounds good to me.
What should the return type of generate
when save_isolated_events
is set to True? Or should we always return like this:
# types: (np.ndarray, list: np.ndarray, JAMS)
return soundscape_audio_array, event_audio_arrays, jam
This set of changes may result in some deprecated functions in Scaper. I wanted to list them here as well as why they are being deprecated for us to keep in mind:
match_sample_length
: We will not need this anymore, as the length of different events in the soundscape will match exactly, thanks to doing everything in-memory at the sample level.get_integrated_lufs
: The behavior of this function will change, as we will switch it to usingpyloudnorm
rather thanffmpeg
. It currently takes a file, but in the future will take a numpy array.r128stats
: Related toget_integrated_lufs
, this makes a call toffmpeg
on the command line, then interprets thestdout
output. Won't be needed afterpyloudnorm
._close_temp_files
: This was a useful context manager when we needed temporary fles everywhere. However, we may not need temporary files in the near future! We don't need it in_generate_audio
, but it is also called intrim
andslice
. We can likely remove it there as well, opting forbuild_array
+soundfile
operations.
Regarding return API, it should be the same regardless of how the options are set.
Focusing on the most common use case, I'd start with:
- soundscape_audio, soundscape_jam
But we also have the text file... I wonder if we should return the list that's used to create it. Then we's return:
- soundscape_audio, soundscape_jam, soundscape_list
Only then would I return isolated events, so now we get:
- soundscape_audio, soundscape_jam, soundscape_list, event_audio_list
thoughts @pseeth ?
Per offline discussion, let's shift discussion of the return API to a different issue, a different PR.
So replacing tfm.fade
operations with the following snippet works great:
Lines 1820 to 1828 in dfb50f5
However, when trying to replace tfm.trim
operations in a similar way by removing:
Lines 1781 to 1783 in dfb50f5
and replacing it with (using SoundFile to read from disk at specified start and stop according to the EventSpec):
# synthesize edited foreground sound event
event_sr = soundfile.info(e.value['source_file']).samplerate
start_sample = int(event_sr * e.value['source_time'])
stop_sample = start_sample + int(event_sr * e.value['source_duration'])
event_audio, event_sr = soundfile.read(
e.value['source_file'], always_2d=True,
start=start_sample, stop=stop_sample)
results in test failures only when resampling is needed. This is because sox.Transformer
resamples the entire audio file first, then trims it. In the workflow above, where the trim is done when the audio is loaded in, the audio is trimmed first then resampled. These operations can't be swapped (resample -> trim != trim -> resample).
However, for long source audio files, it'd be better to trim before resampling. Otherwise, Scaper would incur the cost of resampling the entire audio file first before trimming it every time it accessed that audio in a soundscape. Making this change would require updating the regression data so I suggest we leave it till the end.
Sounds good, yes, let's avoid anything that touches regression data for now, everything else merged, and then in a separate PR add modifications that touch regression data.
All of this happened!