MacBook Pro M3 Crashes
aaronedell opened this issue · 5 comments
When i try
--engine WHISPER_LARGE \
--dataset LIBRI_SPEECH_TEST_OTHER \
--dataset-folder /Users/aedell/Documents/GitHub/speech-to-text-benchmark/datasets/LibriSpeech/test-other \
my Macbook Pro's memory spikes and then the machine locks up. According to ChatGPT, the issue is;
Concurrent Execution: The script uses ProcessPoolExecutor for parallel processing. If the dataset is large and the number of processes (num_workers) is not optimally set, this could lead to high memory usage as each process might consume a significant amount of memory.
Dataset and Engine Initialization: Depending on how the Dataset and Engine classes are implemented (not visible in the provided script), loading large datasets or initializing multiple instances of speech-to-text engines could consume a lot of memory.
Memory Leaks: If there are any memory leaks within the Engine or Dataset classes (e.g., not properly releasing resources), repeated calls in a loop could exacerbate memory consumption over time.
Any suggestions?
Do you get the same issue when running other engines or datasets? Or is it only this combination of arguments?
However, when I try with WHISPER_SMALL I don't have the issue.
I see. Whisper large is a big model and takes a lot of RAM to run. By default we run a thread for each CPU count, and it has a copy of the model on each thread. Try reducing the amount of threads used in the benchmark using the --num-workers
argument? Try something like 7 or 8? I imagine then it will run fine.
Closing due to inactivity